Primary Key Index is a Space Hog

Tables in PostgreSQL have an index for the primary key and separate row storage called the heap. Other databases integrate these together or support “index-organized tables”. In this arrangement, the primary key lookup process leads directly to the row data without a secondary fetch to get the full row and the requisite additional CPU and I/O utilization.

The CLUSTER command in PostgreSQL reorganizes a table according to an index to improve performance, but doesn’t really work for most real-world OLTP cases. It rewrites the entire table under an exclusive lock, blocking any reads or writes. PostgreSQL doesn’t maintain the clustered layout for new data, so this operation must be ran periodically. So it is really only useful if you can take your database offline for long periods of time on a regular basis.

But more critically, index-organized tables save space as the index doesn’t require a separate copy of the row data. For tables with small rows that are mostly covered by the primary key, such as join tables, this can easily cut the table’s storage footprint in half.

Consider the following table which stores social “likes” for arbitrary objects:

CREATE TABLE likes (
 object_type INTEGER NOT NULL,
 object_id BIGINT NOT NULL GENERATED ALWAYS AS IDENTITY,
 user_id BIGINT NOT NULL,
 created_at TIMESTAMP WITH TIME ZONE NOT NULL,
 PRIMARY KEY(object_type, object_id, user_id)
);

PostgreSQL will maintain an index for the primary key which is separate from the base table storage. This index will contain a full copy of the object_type, object_id, and user_id columns for every row. 20 out of 28 bytes in each row (~70%) will be duplicated. If PostgreSQL supported index-organized tables, it wouldn’t consume all that additional space.

Fuente: medium.com/@rbranson/10-things-i-hate-about-postgresql-20dbab8c2791


Posted

in

by