# Managing statistics


This topic provides reference information about statistics and query optimization in SQL Server and PostgreSQL databases. You can understand how these database systems use statistics to improve query performance and how they differ in their approach to collecting and managing statistical data. The topic compares the methods for creating, viewing, and updating statistics in SQL Server with similar functionality in PostgreSQL.


| Feature compatibility |  AWS SCT / AWS DMS automation level |  AWS SCT action code index | Key differences | 
| --- | --- | --- | --- | 
|   ![\[Three star feature compatibility\]](http://docs.aws.amazon.com/dms/latest/sql-server-to-aurora-postgresql-migration-playbook/images/pb-compatibility-3.png)   |  N/A  |  N/A  |  Syntax and option differences, similar functionality.  | 

## SQL Server Usage


Statistics objects in SQL Server are designed to support SQL Server cost-based query optimizer. It uses statistics to evaluate the various plan options and choose an optimal plan for optimal query performance.

Statistics are stored as BLOBs in system tables and contain histograms and other statistical information about the distribution of values in one or more columns. A histogram is created for the first column only and samples the occurrence frequency of distinct values. Statistics and histograms are collected by either scanning the entire table or by sampling only a percentage of the rows.

You can view Statistics manually using the `DBCC SHOW_STATISTICS` statement or the more recent `sys.dm_db_stats_properties` and `sys.dm_db_stats_histogram` system views.

SQL Server provides the capability to create filtered statistics containing a WHERE predicate. Filtered statistics are useful for optimizing histogram granularity by eliminating rows whose values are of less interest, for example NULLs.

SQL Server can manage the collection and refresh of statistics automatically (the default). Use the `AUTO_CREATE_STATISTICS` and `AUTO_UPDATE_STATISTICS` database options to change the defaults.

When a query is submitted with `AUTO_CREATE_STATISTICS` on and the query optimizer may benefit from a statistics that don’t yet exist, SQL Server creates the statistics automatically. You can use the `AUTO_UPDATE_STATISTICS_ASYNC` database property to set new statistics creation to occur immediately (causing queries to wait) or to run asynchronously. When run asynchronously, the triggering run can’t benefit from optimizations the optimizer may derive from it.

After creation of a new statistics object, either automatically or explicitly using the `CREATE STATISTICS` statement, the refresh of the statistics is controlled by the `AUTO_UPDATE_STATISTICS` database option. When set to `ON`, statistics are recalculated when they are stale, which happens when significant data modifications have occurred since the last refresh.

### Syntax


```
CREATE STATISTICS <Statistics Name>
ON <Table Name> (<Column> [,...])
[WHERE <Filter Predicate>]
[WITH <Statistics Options>;
```

### Examples


The following example creates new statistics on multiple columns. Set to use a full scan and to not refresh.

```
CREATE STATISTICS MyStatistics
ON MyTable (Col1, Col2)
WITH FULLSCAN, NORECOMPUTE;
```

The following example updates statistics with a 50% sampling rate.

```
UPDATE STATISTICS MyTable(MyStatistics)
WITH SAMPLE 50 PERCENT;
```

View the statistics histogram and data.

```
DBCC SHOW_STATISTICS ('MyTable','MyStatistics');
```

Turn off automatic statistics creation for a database.

```
ALTER DATABASE MyDB SET AUTO_CREATE_STATS OFF;
```

For more information, see [Statistics](https://docs.microsoft.com/en-us/sql/relational-databases/statistics/statistics?view=sql-server-ver15), [CREATE STATISTICS (Transact-SQL)](https://docs.microsoft.com/en-us/sql/t-sql/statements/create-statistics-transact-sql?view=sql-server-ver15), and [DBCC SHOW\$1STATISTICS (Transact-SQL)](https://docs.microsoft.com/en-us/sql/t-sql/database-console-commands/dbcc-show-statistics-transact-sql?view=sql-server-ver15) in the *SQL Server documentation*.

## PostgreSQL Usage


Use the `ANALYZE` command to collect statistics about a database, a table, or a specific table column. The PostgreSQL `ANALYZE` command collects table statistics that support the generation of efficient query run plans by the query planner.
+  **Histograms** — `ANALYZE` collects statistics on table column values and creates a histogram of the approximate data distribution in each column.
+  **Pages and Rows** — `ANALYZE` collects statistics on the number of database pages and rows from which each table is comprised.
+  **Data Sampling** — For large tables, the `ANALYZE` command takes random samples of values rather than examining each row. This allows the `ANALYZE` command to scan very large tables in a relatively small amount of time.
+  **Statistic Collection Granularity** — Running the `ANALYZE` command without parameters instructs PostgreSQL to examine every table in the current schema. Supplying the table name or column name to `ANALYZE` instructs the database to examine a specific table or table column.

### Automatic Statistics Collection


By default, PostgreSQL is configured with an `AUTOVACUUM` daemon which automates the run of statistics collection by using the ANALYZE commands (in addition to automation of the VACUUM command). The `AUTOVACUUM` daemon scans for tables that show signs of large modifications in data to collect the current statistics. `AUTOVACUUM` is controlled by several parameters.

Individual tables have several storage parameters which can trigger `AUTOVACUUM` process sooner or later. You can set or change such parameters as `autovacuum_enabled`, `autovacuum_vacuum_threshold`, and others, using `CREATE TABLE` or `ALTER TABLE` statements.

```
ALTER TABLE custom_autovaccum SET (autovacuum_enabled = true, autovacuum_vacuum_cost_delay = 10ms, autovacuum_vacuum_scale_factor = 0.01, autovacuum_analyze_scale_factor = 0.005);
```

The preceding command enables `AUTOVACUUM` for the `custom_autovaccum` table and specifies the `AUTOVACUUM` process to sleep for 10 milliseconds each run.

It also specifies a 1% of the table size to be added to `autovacuum_vacuum_threshold` and 0.5% of the table size to be added to `autovacuum_analyze_threshold` when deciding whether to trigger a `VACUUM`.

For more information, see [Automatic Vacuuming](https://www.postgresql.org/docs/13/runtime-config-autovacuum.html) in the *PostgreSQL documentation*.

### Manual Statistics Collection


In PostgreSQL, you can collect statistics on-demand using the `ANALYZE` command at the database level, table level, or column level.
+  `ANALYZE` on indexes isn’t currently supported.
+  `ANALYZE` requires only a read-lock on the target table. It can run in parallel with other activity on the table.
+ For large tables, `ANALYZE` takes a random sample of the table contents. It is configured by the show `default_statistics_target` parameter. The default value is 100 entries. Raising the limit might allow more accurate planner estimates to be made at the price of consuming more space in the `pg_statistic` table.

Starting from PostgreSQL 10, there is a new command `CREATE STATISTICS`, which creates a new extended statistics object tracking data about the specified table.

The `STATISTICS` object tells the server to collect more detailed statistics.

### Examples


The following example gathers statistics for the entire database.

```
ANALYZE;
```

The following example gathers statistics for a specific table. The `VERBOSE` keyword displays progress.

```
ANALYZE VERBOSE EMPLOYEES;
```

The following example gathers statistics for a specific column.

```
ANALYZE EMPLOYEES (HIRE_DATE);
```

Specify the default\$1statistics\$1target parameter for an individual table column and reset it back to default.

```
ALTER TABLE EMPLOYEES ALTER COLUMN SALARY SET STATISTICS 150;

ALTER TABLE EMPLOYEES ALTER COLUMN SALARY SET STATISTICS -1;
```

Larger values increase the time needed to complete an ANALYZE, but improve the quality of the collected planner’s statistics, which can potentially lead to better run plans.

View the current (session or global) `default_statistics_target`, modify it to 150, and analyze the `EMPLOYEES` table:

```
SHOW default_statistics_target ;
SET default_statistics_target to 150;
ANALYZE EMPLOYEES ;
```

View the last time statistics were collected for a table.

```
select relname, last_analyze from pg_stat_all_tables;
```

## Summary


| Feature | SQL Server | PostgreSQL | 
| --- | --- | --- | 
|  Analyze a specific database table  |  <pre>CREATE STATISTICS MyStatistics<br />ON MyTable (Col1, Col2)</pre>  |  <pre>ANALYZE EMPLOYEES;</pre>  | 
|  Analyze a database table while only sampling certain rows  |  <pre>UPDATE STATISTICS MyTable(MyStatistics)<br />WITH SAMPLE 50 PERCENT;</pre>  |  Configure the number of entries for the table: <pre>SET default_statistics_target to 150;<br />ANALYZE EMPLOYEES ;</pre>  | 
|  View last time statistics were collected  |  <pre>DBCC SHOW_STATISTICS ('MyTable','MyStatistics');</pre>  |  <pre>select relname, last</pre>  | 

For more information, see [ANALYZE](https://www.postgresql.org/docs/13/sql-analyze.html) and [The Autovacuum Daemon](https://www.postgresql.org/docs/13/routine-vacuuming.html#AUTOVACUUM) in the *PostgreSQL documentation*.