Optimizing storage and managing cleanup
Transactions are an integral part of the PostgreSQL system. However, transactions come with a small price tag attached. As already shown in this chapter, it can happen that concurrent users will be presented with different data. Not everybody will get the same data returned by a query. In addition to that, DELETE and UPDATE are not allowed to actually overwrite data as ROLLBACK would not work. If you happen to be in the middle of a large DELETE operation, you cannot be sure whether you will be able to COMMIT or not. In addition to that, data is still visible while you do a DELETE, and sometimes data is even visible once your modification has long since finished.
Consequently, this means that cleanup has to happen asynchronously. A transaction can not clean up its own mess and COMMIT/ROLLBACK might be too early to take care of dead rows.
The solution to the problem is VACUUM:
test=# h VACUUM
Command: VACUUM
Description: garbage-collect and optionally analyze a database
Syntax:
VACUUM [ ( { FULL | FREEZE | VERBOSE | ANALYZE } [, ...] ) ] [ table_name [ (column_name [, ...] ) ] ]
VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ table_name ]
VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] ANALYZE [ table_name [ (column_name [, ...] ) ] ]
VACUUM will visit all pages that potentially contain modifications and find all the dead space. The free space found is then tracked by the free space map (FSM) of the relation.
Note that VACUUM will, in most cases, not shrink the size of a table. Instead, it will track and find free space inside existing storage files.
What this means to end users will be outlined in the Watching VACUUM at work section of this chapter.