Vacuuming and analyzing tables automatically
Autovacuum is a daemon (that is, it runs in the background) that automatically vacuums (cleans up) dead tuples, reclaims storage, and gathers statistics. It checks for bloated tables in the database and clears the bloat to reuse the space. It monitors database tables and indexes and adds them to a vacuum job after they reach a specific threshold of update or delete operations.
Autovacuum manages vacuuming by automating the PostgreSQL VACUUM and
            ANALYZE commands. VACUUM removes bloat from tables and
        reclaims the space, whereas ANALYZE updates the statistics that enable the
        optimizer to produce efficient plans. VACUUM also performs a major task called
            vacuum freezing to prevent transaction ID wraparound issues in the
        database. Every row that is updated in the database receives a transaction ID from the
        PostgreSQL transaction control mechanism. These IDs control the row's visibility to other
        concurrent transactions. The transaction ID is a 32-bit number. Two billion IDs are always
        kept in the visible past. The remaining (about 2.2 billion) IDs are preserved for
        transactions that will take place in the future and are hidden from the current transaction.
        PostgreSQL requires an occasional cleaning and freezing of old rows in
        order to prevent transactions from wrapping around and making old, existing rows invisible
        when new transactions are created. For more information, see Preventing Transaction ID Wraparound Failures
Autovacuum is recommended and enabled by default. Its parameters include the following.
| Parameter | Description | Default for Amazon RDS | Default for Aurora | 
| 
 | The minimum number of tuple update or delete operations that must occur on a table before autovacuum vacuums it. | 50 operations | 50 operations | 
| 
 | The minimum number of tuple inserts, updates, or deletes that must occur on a table before autovacuum analyzes it. | 50 operations | 50 operations | 
| 
 | The percentage of tuples that must be modified in a table before autovacuum vacuums it. | 0.1 | 0.1 | 
| 
 | The percentage of tuples that must be modified in a table before autovacuum analyzes it. | 0.05 | 0.05 | 
| 
 | The maximum age of frozen IDs before a table is vacuumed to prevent transaction ID wraparound issues. | 200,000,000 transactions | 200,000,000 transactions | 
Autovacuum makes a list of tables to process based on specific threshold formulas, as follows.
- 
            Threshold for running VACUUMon a table:vacuum threshold = autovacuum_vacuum_threshold + (autovacuum_vacuum_scale_factor * Total row count of table)
- 
            Threshold for running ANALYZEon a table:analyze threshold = autovacuum_analyze_threshold + (autovacuum_analyze_scale_factor * Total row count of table)
For small to medium-sized tables, the default values might be sufficient. However, a large table that has frequent data modifications will have a higher number of dead tuples. In this case, autovacuum might process the table frequently for maintenance, and the maintenance of other tables might get delayed or ignored until the large table finishes. To avoid this, you can tune the autovacuum parameters described in the following section.
Autovacuum memory-related parameters
autovacuum_max_workers
Specifies the maximum number of autovacuum processes (other than the autovacuum launcher) that can run at the same time. This parameter can be set only when you start the server. If the autovacuum process is busy with a large table, this parameter helps run the cleanup for other tables.
maintenance_work_mem
Specifies the maximum amount of memory to be used by maintenance operations such as
                VACUUM, CREATE INDEX, and ALTER. In Amazon RDS and
            Aurora, memory is allocated based on the instance class by using the formula
                GREATEST({DBInstanceClassMemory/63963136*1024},65536). When autovacuum
            runs, up to autovacuum_max_workers times that calculated value can be
            allocated, so be careful not to set the value too high. To control this, you can set
                autovacuum_work_mem separately.
autovacuum_work_mem
Specifies the maximum amount of memory to be used by each autovacuum worker process.
            This parameter defaults to -1, which indicates that you should use the value of
                maintenance_work_mem instead.
For more information about autovacuum memory parameters, see Allocating memory for autovacuum in the Amazon RDS documentation.
Tuning autovacuum parameters
Users might need to tune autovacuum parameters depending on the their update and delete operations. The settings for the following parameters can be set at the table, instance, or cluster level.
Cluster or instance level
As an example, let's look at a banking database where continuous data manipulation language (DML) operations are expected. To maintain the database's health, you should tune autovacuum parameters at the cluster level for Aurora and at the instance level for Amazon RDS, and apply the same parameter group to the reader as well. In the case of a failover, the same parameters should apply to the new writer.
Table level
For example, in a database for food delivery where continuous DML operations are
                expected on a single table called orders, you should consider tuning
                the autovacuum_analyze_threshold parameter at the table level by using
                the following command:
ALTER TABLE <table_name> SET (autovacuum_analyze_threshold = <threshold rows>)
Using aggressive autovacuum settings at the table level
The example orders table that has continuous update and delete
                operations becomes a candidate for vacuuming because of default autovacuum settings.
                This leads to bad plan generation and slow queries. Clearing the bloat and updating
                statistics requires table-level aggressive autovaccum settings.
To determine settings, keep track of the duration of queries running on this table
                and identify the percentage of DML operations that result in plan changes. The
                    pg_stat_user_tables view helps you track insert, update, and delete
                operations.
Example:
Let's assume that the optimizer generates bad plans whenever 5 percent of the
                    orders table changes. In this case, you should change the scale factor threshold
                to 2 percent as follows:
ALTER TABLE orders SET (autovacuum_vacuum_scale_factor = 0.02)
Tip
Select aggressive autovacuum settings carefully to avoid high consumption of resources.
For more information, see the following:
- 
                    Understanding autovacuum in Amazon RDS for PostgreSQL environments (AWS blog post) 
- 
                    Automatic Vacuuming (PostgreSQL documentation) 
- 
                    Tuning PostgreSQL parameters in Amazon RDS and Amazon Aurora (AWS Prescriptive Guidance) 
To make sure that autovacuum works effectively, monitor dead rows, disk usage, and
                the last time autovacuum or ANALYZE ran on a regular basis. The
                    pg_stat_all_tables view provides information on each table
                    (relname) and how many dead tuples (n_dead_tup) are in
                the table.
Monitoring the number of dead tuples in each table, especially in frequently updated tables, helps you determine if the autovacuum processes are periodically removing the dead tuples so their disk space can be reused for better performance. You can use the following query to check the number of dead tuples and when the last autovacuum ran on the tables:
SELECT relname AS TableName,n_live_tup AS LiveTuples,n_dead_tup AS DeadTuples, last_autovacuum AS Autovacuum,last_autoanalyze AS Autoanalyze_FROM pg_stat_user_tables;
Advantages and limitations
Autovacuum provides the following advantages:
- 
                It removes bloat from tables automatically. 
- 
                It prevents transaction ID wraparound. 
- 
                It keeps database statistics up to date. 
Limitations:
- 
                If queries use parallel processing, the number of worker processes might not be enough for autovacuum. 
- 
                If autovacuum runs during peak hours, resource utilization might increase. You should tune parameters to handle this issue. 
- 
                If table pages are occupied in another session, autovacuum might skip those pages. 
- 
                Autovacuum can't access temporary tables.