When doing a major version upgrade of Postgres for example from v13 → v17, several things change that affect query plans:
┌─────────────────────────────────────────────────────────┐
│ Major Version Upgrade (pg_upgrade) │
│ │
│ ✗ Optimizer statistics become stale/incompatible │
│ ✗ Query planner heuristics change significantly │
│ ✗ New default config params (e.g., JIT enabled) │
│ ✗ Cost estimation model differences │
│ ✗ New plan types/strategies the planner may prefer │
│ ✗ pg_statistic data may not reflect new planner needs │
└─────────────────────────────────────────────────────────┘
ANALYZE VERBOSE alone is often NOT enough — it rebuilds statistics, but if the new planner chooses fundamentally different plans, you need a more comprehensive approach.
Pre-Upgrade Runbook
PHASE 1: Capture Baseline (Before Upgrade)
1a. Enable and snapshot pg_stat_statements
-- Ensure extension exists
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Capture top 50 most time-consuming queries
SELECT
queryid,
substr(query, 1, 100) AS query_preview,
calls,
mean_exec_time,
total_exec_time,
rows,
shared_blks_hit,
shared_blks_read
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 50;
Save this output — you'll compare it post-upgrade.
1b. Save Execution Plans for Critical Queries
-- For each critical query, save the plan
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
<your_critical_query>;
1c. Identify Large Tables with Sequential Scans (Missing Index Candidates)
-- Tables that might need indexes BEFORE upgrade
SELECT
schemaname,
relname,
seq_scan,
seq_tup_read,
idx_scan,
n_live_tup,
ROUND((seq_tup_read::numeric / GREATEST(n_live_tup, 1)), 2) AS seq_scan_ratio
FROM pg_stat_user_tables
WHERE n_live_tup > 10000
AND seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 30;
1d. Capture Current Index Inventory
SELECT
schemaname,
tablename,
indexname,
indexdef
FROM pg_indexes
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY tablename, indexname;
PHASE 2: Pre-Upgrade Testing (CRITICAL — Don't Skip)
2a. Clone & Test First
# For RDS: Create a snapshot and restore to a test instance
# Upgrade the TEST instance first, not production directly
aws rds restore-db-instance-from-db-snapshot \
--db-instance-identifier rds13-upgrade-test \
--db-snapshot-identifier <pre-upgrade-snapshot> \
--db-instance-class <same-class>
# Then upgrade the test instance to target version
aws rds modify-db-instance \
--db-instance-identifier rds13-upgrade-test \
--engine-version <target-version> \
--apply-immediately
2b. Run Critical Queries Against Upgraded Clone
-- Compare plans on the upgraded test instance
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
<your_critical_query>;
-- Look for:
-- Seq Scan where there was Index Scan before
-- Nested Loop where Hash Join was used before
-- Significant cost/row estimate differences
PHASE 3: Configuration Alignment
3a. Compare & Pin Critical Parameters
-- Check these on BOTH old and new versions
SHOW random_page_cost; -- Often needs tuning for SSD/RDS
SHOW effective_cache_size;
SHOW work_mem;
SHOW default_statistics_target;
SHOW jit; -- ⚠️ NEW in PG12+, can cause regressions
SHOW plan_cache_mode;
SHOW enable_memoize; -- NEW in PG14+, can cause regressions
3b. Recommended Parameter Adjustments Post-Upgrade
-- Disable JIT initially (known to cause regressions in OLTP workloads)
ALTER DATABASE your_db SET jit = off;
-- If on SSD/RDS (EBS gp3/io1), ensure this is set low
ALTER DATABASE your_db SET random_page_cost = 1.1;
-- Increase statistics target for critical columns
ALTER TABLE critical_table ALTER COLUMN critical_column
SET STATISTICS 1000;
-- Disable memoize if causing plan regressions (PG14+)
ALTER DATABASE your_db SET enable_memoize = off;
Post-Upgrade Runbook (Immediately After Upgrade)
PHASE 4: Statistics Rebuild (Comprehensive)
4a. Run Three-Stage Analyze (NOT just ANALYZE VERBOSE)
# This is what pg_upgrade recommends — three passes with increasing targets
# Stage 1: target=1 (fast, gets basic stats)
# Stage 2: target=10 (moderate)
# Stage 3: default target (full)
vacuumdb --all --analyze-in-stages --jobs=4
This is the single most important step — ANALYZE VERBOSE on individual tables is less effective than --analyze-in-stages.
4b. Force Extended Statistics Rebuild
-- If you have extended statistics
SELECT 'ANALYZE ' || schemaname || '.' || tablename || ';'
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema');
4c. Update Visibility Map (for Index-Only Scans)
# VACUUM is needed to rebuild the visibility map
vacuumdb --all --jobs=4
PHASE 5: Post-Upgrade Validation
5a. Compare pg_stat_statements (Before vs After)
-- Check for queries with significantly higher mean_exec_time
SELECT
queryid,
substr(query, 1, 100) AS query_preview,
calls,
mean_exec_time,
total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 30;
5b. Detect Plan Regressions Automatically
-- Find tables suddenly doing seq scans
SELECT
schemaname,
relname,
seq_scan,
last_seq_scan, -- PG16+
idx_scan,
last_idx_scan, -- PG16+
n_live_tup
FROM pg_stat_user_tables
WHERE seq_scan > 0
AND n_live_tup > 10000
ORDER BY seq_scan DESC;
5c. Identify Missing Indexes (Proactively)
-- Hypothetical missing indexes based on expensive query patterns
SELECT
t.schemaname,
t.relname,
t.seq_scan,
t.seq_tup_read,
t.n_live_tup,
pg_size_pretty(pg_relation_size(t.relid)) AS table_size
FROM pg_stat_user_tables t
WHERE t.seq_scan > 100
AND t.n_live_tup > 50000
AND t.seq_tup_read > t.n_live_tup * 5
AND NOT EXISTS (
SELECT 1 FROM pg_stat_user_indexes i
WHERE i.relid = t.relid AND i.idx_scan > 0
)
ORDER BY t.seq_tup_read DESC;
Consolidated Checklist for Today's Upgrade
PRE-UPGRADE (rds13xx)
═══════════════════════════════════════════════════════
☐ Snapshot pg_stat_statements output
☐ Save EXPLAIN ANALYZE for top 20 critical queries
☐ Document all existing indexes
☐ Identify large tables with high seq_scan counts
☐ Test upgrade on a cloned instance FIRST
☐ Compare query plans on clone vs production
☐ Create any missing indexes identified during clone testing
☐ Note current postgresql parameter values
POST-UPGRADE (Immediately)
═══════════════════════════════════════════════════════
☐ Run: vacuumdb --all --analyze-in-stages --jobs=4
☐ Run: vacuumdb --all --jobs=4
☐ SET jit = off (at database level)
☐ SET random_page_cost = 1.1 (if SSD/RDS)
☐ Verify random_page_cost, work_mem, effective_cache_size
☐ Compare pg_stat_statements (before vs after)
☐ Re-run EXPLAIN ANALYZE on critical queries
☐ Monitor for seq_scan regressions on large tables
☐ Check application response times / APM dashboards
ROLLBACK PLAN
═══════════════════════════════════════════════════════
☐ Keep pre-upgrade RDS snapshot
☐ If critical regressions found, restore from snapshot
☐ Document any queries needing new indexes for post-mortem
Key Takeaway
ANALYZE VERBOSE alone ≠ sufficient
The correct post-upgrade sequence is:
1. vacuumdb --all --analyze-in-stages --jobs=4 ← Statistics
2. vacuumdb --all --jobs=4 ← Visibility map
3. Disable JIT (jit=off) ← Common regression source
4. Tune random_page_cost for RDS/SSD ← Plan quality
5. Compare execution plans for top queries ← Validation
6. Proactively create missing indexes ← Prevention
Proactively identifying these on the clone before upgrading production is the single best prevention strategy.
Reference: https://www.postgresql.org/docs/17/pgupgrade.html