Setting Up pglogical Replication on AWS Aurora PostgreSQL
Setting up pglogical replication on Aurora PostgreSQL involves configuring both the publisher (the source database) and the subscriber (the target database). Since Aurora is a managed service, many of the administrative tasks are handled for you, but you will still need to configure the RDS parameter group, enable pglogical extension, and set up publication and subscription.
Here’s a step-by-step guide to setting up pglogical replication in AWS Aurora PostgreSQL:
Step 1: Enable pglogical Extension on Aurora PostgreSQL
-
Log in to the AWS Management Console and navigate to Amazon RDS.
-
Modify the Aurora DB Cluster to enable the
pglogical
extension:-
In the RDS Console, go to Databases > select your Aurora PostgreSQL cluster.
-
Click Modify.
-
In the DB Cluster Parameter Group section, either select an existing parameter group or create a new one.
-
New parameter group: Create one with a name like
pglogical-params
.
-
-
Add the following parameters to the parameter group:
-
shared_preload_libraries = pglogical
— this loads the pglogical extension. -
wal_level = logical
— required for logical replication. -
Other optional parameters: Based on your replication setup, you may also adjust
max_replication_slots
,max_wal_senders
, etc. (as discussed earlier).
-
-
-
Apply the parameter group to the Aurora DB Cluster and reboot the cluster to apply the changes.
Important: Changing parameters like
shared_preload_libraries
andwal_level
will require a cluster reboot.
Step 2: Install pglogical Extension on the Publisher
-
Connect to your Aurora PostgreSQL instance as the
rds_superuser
(or other superuser) role using a PostgreSQL client (e.g.,psql
orpgAdmin
). -
Run the following command to create the pglogical extension on the publisher database:
CREATE EXTENSION pglogical;
-
Verify that the extension is installed by running:
SELECT * FROM pg_extension WHERE extname = 'pglogical';
Step 3: Create a Publication on the Publisher
The publication defines which tables (or schemas) will be replicated to the subscriber. You can create a publication to replicate specific tables or entire schemas.
-
Connect to the publisher database and create a publication.
For example, to replicate all tables in a schema:
-- Create a publication for all tables in a schema
SELECT pg_create_publication('my_publication', FOR ALL TABLES IN SCHEMA public);
Alternatively, if you want to replicate specific tables, you can define them:
-- Create a publication for specific tables
SELECT pg_create_publication('my_publication', FOR TABLE my_table1, my_table2);
-
To list the publications:
SELECT * FROM pg_catalog.pg_publication;
Step 4: Enable pglogical Replication on the Subscriber
-
On the subscriber Aurora PostgreSQL instance, connect as
rds_superuser
(or another superuser). -
Install the pglogical extension if it hasn’t been done already:
CREATE EXTENSION pglogical;
-
Create a Subscription to link the subscriber to the publisher and start receiving replicated data.
-- Create a subscription
SELECT pg_create_subscription(
'my_subscription', -- Subscription name
'host=publisher_host port=5432 dbname=publisher_db user=replication_user password=your_password', -- Connection details for the publisher
'my_publication', -- The publication name you created on the publisher
false, -- Whether to copy existing data (recommended to set as 'false' after initial data sync)
true -- Whether to start the replication immediately
);
-
Replace
publisher_host
,publisher_db
,replication_user
, andyour_password
with the actual connection details for the publisher. -
my_publication
should be the publication you created on the publisher.
-
Verify the subscription status:
SELECT * FROM pg_catalog.pg_subscription;
Step 5: Monitor and Manage Replication
Once the publisher and subscriber are set up, you can start monitoring the replication process. Here are some useful queries for monitoring:
-
Check Replication Status:
SELECT * FROM pg_stat_replication;
-
Check pglogical replication status:
SELECT * FROM pg_stat_subscription;
-
Check replication lag (how far the subscriber is behind the publisher):
SELECT
pid,
usename,
application_name,
state,
sync_state,
sent_lsn,
write_lsn,
flush_lsn,
replay_lsn,
pg_xlog_location_diff(sent_lsn, replay_lsn) AS replication_lag
FROM pg_stat_replication;
-
Check Replicated Tables:
SELECT * FROM pg_catalog.pg_publication_tables;
Step 6: Handle Failover and Recovery
If the publisher or subscriber fails, replication may pause or failover. AWS Aurora has built-in features for high availability and failover that can be combined with pglogical replication to ensure continued data replication across regions or between different clusters.
-
Multi-AZ Deployment: Aurora’s built-in failover mechanism can automatically promote a replica to be the primary if the primary instance fails.
-
Replication Resume: pglogical will attempt to resume replication once the publisher or subscriber is back online.
-
Cross-Region Replication: You can set up cross-region replication using pglogical to maintain data consistency between different geographical regions.
Other Considerations and Best Practices
-
Data Synchronization: For the initial data sync, you may want to disable replication until the initial load is complete (set the subscription
sync
parameter tofalse
). -
Replication Lag: If you have a high write load on the publisher, replication lag might increase. Make sure to monitor
pg_stat_replication
andpg_stat_subscription
to track replication health. -
Backups: Ensure that you have a regular backup strategy for both the publisher and the subscriber databases.
-
Network Latency: Since pglogical relies on network connectivity, keep network latency low between the publisher and subscriber for faster replication.
Step 7: Testing the Replication Setup
-
On the Publisher: Insert some test data into the replicated table(s):
INSERT INTO public.my_table (id, name) VALUES (1, 'test_name');
-
On the Subscriber: Check if the data appears in the replicated table:
SELECT * FROM public.my_table WHERE id = 1;
If everything is set up correctly, the data should appear in the subscriber database.
Final Thoughts:
-
pglogical replication on AWS Aurora PostgreSQL is a powerful tool for flexible and granular data replication. It provides high availability, cross-region replication, and the ability to replicate specific tables or schemas.
-
By adjusting replication-related parameters, you can optimize replication performance to minimize lag and ensure near-real-time data propagation.
What is pglogical replication?
pglogical replication is a logical replication system for PostgreSQL that allows you to replicate only specific parts of your database (tables, schemas, etc.) rather than the entire database. It is more flexible than physical replication, as you can selectively replicate data from one server (the publisher) to one or more servers (the subscribers) in real time.
In simpler terms:
-
Publisher: The database that sends data.
-
Subscriber: The database that receives data.
-
pglogical: The tool that enables the data to move between the publisher and the subscriber.
Key Concepts in pglogical:
-
Publisher:
-
The database that holds the original data and sends it out for replication.
-
You create a publication in the publisher, which defines what data (tables or schemas) will be replicated to the subscriber.
-
-
Subscriber:
-
The database that receives data from the publisher.
-
The subscription in the subscriber tells it where to get the data from the publisher and defines what it should replicate.
-
-
Replication:
-
Logical replication means only the changes to the data (inserts, updates, deletes) are replicated, not the entire database.
-
Unlike physical replication, it allows selective replication of data (e.g., you can replicate only certain tables).
-
pglogical Replication Process:
-
Setup:
-
On the publisher, you set up a publication (a set of tables or entire schemas).
-
On the subscriber, you create a subscription to that publication.
-
-
Data Flow:
-
When data changes (INSERT, UPDATE, DELETE) on the publisher, pglogical replicates those changes to the subscriber in real-time.
-
Recommended Parameters for Speeding Up Replication:
To ensure fast and efficient replication using pglogical in Aurora PostgreSQL, there are several parameters to adjust:
1. For the Publisher (Aurora DB Cluster):
Parameter | Recommended Setting | Reason |
---|---|---|
wal_level |
logical |
Required for logical replication. |
max_replication_slots |
Set to at least the number of subscriptions + 1 | Ensures enough replication slots for the subscribers. |
max_wal_senders |
Set to at least max_replication_slots + 2 |
Ensures enough replication connections for WAL streaming. |
wal_sender_timeout |
60s or higher |
Timeout for WAL sender to prevent early disconnects in case of replication delay. |
max_parallel_workers_per_gather |
4 or higher |
Speed up query performance for replication if you're using parallel query features. |
2. For the Subscriber:
Parameter | Recommended Setting | Reason |
---|---|---|
max_replication_slots |
Set to the number of subscriptions + reserve | Ensures the subscriber can handle the replication workload. |
max_logical_replication_workers |
At least equal to the number of subscriptions + reserve | Allows enough workers to handle multiple subscriptions. |
max_worker_processes |
At least max_logical_replication_workers + 1 |
Ensures enough worker processes for replication and other tasks. |
max_sync_workers_per_subscription |
2 or higher |
Enables faster synchronization during the initial data copy phase. |
max_parallel_apply_workers_per_subscription |
2 or higher |
Allows more parallelism for applying changes to the subscriber. |
Other Performance Tips:
-
Use SSD storage: For both publisher and subscriber, SSD-backed storage can improve I/O performance and reduce replication lag.
-
Optimize network latency: Minimize network delays between the publisher and the subscriber, as this can affect the real-time replication speed.
-
Batch updates: For bulk updates or inserts, try to use batch operations instead of row-by-row operations to speed up replication.
Useful and Interesting Features of AWS pglogical Replication:
-
Easy Setup in AWS RDS/Aurora:
-
AWS provides a managed environment for setting up pglogical replication. You can enable it on Aurora PostgreSQL with minimal configuration, without needing to install additional extensions manually.
-
-
Selective Replication:
-
With pglogical, you can replicate only specific tables or even specific rows (using filtering). This allows for more granular control over what data gets replicated, saving resources and improving performance.
-
-
Cross-Region Replication:
-
AWS Aurora Global Databases can be combined with pglogical to replicate data across different AWS regions. This enables disaster recovery and low-latency access to your data from different parts of the world.
-
-
Support for Parallel Replication:
-
pglogical supports parallel replication, which can help speed up the replication of large data sets, especially during initial syncs or when there are large transaction volumes.
-
-
Logical Replication for Upgrades:
-
pglogical can be used to replicate data during PostgreSQL version upgrades or cluster migrations. You can create a new cluster and use pglogical to replicate data from the old cluster, ensuring zero downtime during the migration.
-
-
Failover and High Availability:
-
When combined with AWS RDS Multi-AZ or Aurora's multi-region feature, pglogical replication helps ensure high availability, as the replication can automatically continue even if one instance fails.
-
-
No Need for Shared Storage:
-
Unlike physical replication (which requires shared storage between nodes), logical replication with pglogical allows you to replicate between independent nodes, providing more flexibility in deployment.
-
-
Change Data Capture (CDC):
-
pglogical provides change data capture for incremental data replication, allowing systems like analytics engines to consume data in near real-time, without the overhead of replicating full tables.
-
-
Multi-master Replication (Experimental):
-
pglogical supports multi-master replication (although experimental), meaning you can have multiple publishers, each with its own set of subscriptions. This can be useful for complex distributed systems.
-
Quick Summary:
-
Publisher: Sends data.
-
Subscriber: Receives data.
-
pglogical: The tool that enables flexible and selective replication between PostgreSQL databases.
-
Performance Tips: Tuning parameters like
max_replication_slots
,max_wal_senders
,max_worker_processes
, and utilizing parallelism can improve replication speed. -
AWS Features: Easy setup, cross-region replication, support for failover and high availability, and minimal downtime for migrations.