Measuring replication lag among pgactive members

You can use the following query to view the replication lag among the pgactive members. Run this query on every pgactive node to get the full picture.


    
app=> SELECT * FROM pgactive.pgactive_get_replication_lag_info();
│-[ RECORD 1 ]--------+---------------------------------------------
│node_name            | node2-app
│node_sysid           | 7481018224801653637
│application_name     | pgactive:7481018224801653637:send
│slot_name            | pgactive_16385_7481018224801653637_0_16385__
│active               | t
│active_pid           | 783486
│pending_wal_decoding | 0
│pending_wal_to_apply | 0
│restart_lsn          | 0/2108150
│confirmed_flush_lsn  | 0/2154690
│sent_lsn             | 0/2154690
│write_lsn            | 0/2154690
│flush_lsn            | 0/2154690
│replay_lsn           | 0/2154690
│-[ RECORD 2 ]--------+---------------------------------------------
│node_name            | node1-app
│node_sysid           | 7481018033434600853
│application_name     | pgactive:7481018033434600853:send
│slot_name            | pgactive_16385_7481018033434600853_0_16385__
│active               | t
│active_pid           | 783488
│pending_wal_decoding | 0
│pending_wal_to_apply | 0
│restart_lsn          | 0/20F5AD0
│confirmed_flush_lsn  | 0/214EF68
│sent_lsn             | 0/214EF68
│write_lsn            | 0/214EF68
│flush_lsn            | 0/214EF68
│replay_lsn           | 0/214EF68

Monitor the following diagnostics at a minimum:

active

Set up alerts when active is false, which indicates that the slot isn't currently in use (the subscriber instance has disconnected from the publisher).

pending_wal_decoding

In PostgreSQL's logical replication, WAL files are stored in binary format. The publisher must decode these WAL changes and convert them into logical changes (such as insert, update, or delete operations).

The metric pending_wal_decoding shows the number of WAL files waiting to be decoded on the publisher side.

This number can increase due to these factors:

When the subscriber isn't connected, active status will be false and pending_wal_decoding will increase
The slot is active, but the publisher can't keep up with the volume of WAL changes

pending_wal_to_apply

The metric pending_wal_apply indicates the number of WAL files waiting to be applied on the subscriber side.

Several factors can prevent the subscriber from applying changes and potentially cause a disk full scenario:

Schema differences - for example, when you have changes in the WAL stream for a table named sample, but that table doesn't exist on the subscriber side
Values in the primary key columns were updated
Secondary unique indexes can cause data divergence

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Setting up active-active replication

Configuring parameter settings for the pgactive extension