When doing a major version upgrade of Postgres for example from v13 → v17, several things change that affect query plans:
┌─────────────────────────────────────────────────────────┐
│ Major Version Upgrade (pg_upgrade) │
│ │
│ ✗ Optimizer statistics become stale/incompatible │
│ ✗ Query planner heuristics change significantly │
│ ✗ New default config params (e.g., JIT enabled) │
│ ✗ Cost estimation model differences │
│ ✗ New plan types/strategies the planner may prefer │
│ ✗ pg_statistic data may not reflect new planner needs │
└─────────────────────────────────────────────────────────┘
ANALYZE VERBOSEalone is often NOT enough — it rebuilds statistics, but if the new planner chooses fundamentally different plans, you need a more comprehensive approach.
Pre-Upgrade Runbook
PHASE 1: Capture Baseline (Before Upgrade)
1a. Enable and snapshot pg_stat_statements
-- Ensure extension exists CREATE EXTENSION IF NOT EXISTS pg_stat_statements; -- Capture top 50 most time-consuming queries SELECT queryid, substr(query, 1, 100) AS query_preview, calls, mean_exec_time, total_exec_time, rows, shared_blks_hit, shared_blks_read FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 50;
Save this output — you'll compare it post-upgrade.
1b. Save Execution Plans for Critical Queries
-- For each critical query, save the plan EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) <your_critical_query>;
1c. Identify Large Tables with Sequential Scans (Missing Index Candidates)
-- Tables that might need indexes BEFORE upgrade SELECT schemaname, relname, seq_scan, seq_tup_read, idx_scan, n_live_tup, ROUND((seq_tup_read::numeric / GREATEST(n_live_tup, 1)), 2) AS seq_scan_ratio FROM pg_stat_user_tables WHERE n_live_tup > 10000 AND seq_scan > 0 ORDER BY seq_tup_read DESC LIMIT 30;
1d. Capture Current Index Inventory
SELECT schemaname, tablename, indexname, indexdef FROM pg_indexes WHERE schemaname NOT IN ('pg_catalog', 'information_schema') ORDER BY tablename, indexname;
PHASE 2: Pre-Upgrade Testing (CRITICAL — Don't Skip)
2a. Clone & Test First
# For RDS: Create a snapshot and restore to a test instance # Upgrade the TEST instance first, not production directly aws rds restore-db-instance-from-db-snapshot \ --db-instance-identifier rds13-upgrade-test \ --db-snapshot-identifier <pre-upgrade-snapshot> \ --db-instance-class <same-class> # Then upgrade the test instance to target version aws rds modify-db-instance \ --db-instance-identifier rds13-upgrade-test \ --engine-version <target-version> \ --apply-immediately
2b. Run Critical Queries Against Upgraded Clone
-- Compare plans on the upgraded test instance EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) <your_critical_query>; -- Look for: -- Seq Scan where there was Index Scan before -- Nested Loop where Hash Join was used before -- Significant cost/row estimate differences
PHASE 3: Configuration Alignment
3a. Compare & Pin Critical Parameters
-- Check these on BOTH old and new versions SHOW random_page_cost; -- Often needs tuning for SSD/RDS SHOW effective_cache_size; SHOW work_mem; SHOW default_statistics_target; SHOW jit; -- ⚠️ NEW in PG12+, can cause regressions SHOW plan_cache_mode; SHOW enable_memoize; -- NEW in PG14+, can cause regressions
3b. Recommended Parameter Adjustments Post-Upgrade
-- Disable JIT initially (known to cause regressions in OLTP workloads) ALTER DATABASE your_db SET jit = off; -- If on SSD/RDS (EBS gp3/io1), ensure this is set low ALTER DATABASE your_db SET random_page_cost = 1.1; -- Increase statistics target for critical columns ALTER TABLE critical_table ALTER COLUMN critical_column SET STATISTICS 1000; -- Disable memoize if causing plan regressions (PG14+) ALTER DATABASE your_db SET enable_memoize = off;
Post-Upgrade Runbook (Immediately After Upgrade)
PHASE 4: Statistics Rebuild (Comprehensive)
4a. Run Three-Stage Analyze (NOT just ANALYZE VERBOSE)
# This is what pg_upgrade recommends — three passes with increasing targets # Stage 1: target=1 (fast, gets basic stats) # Stage 2: target=10 (moderate) # Stage 3: default target (full) vacuumdb --all --analyze-in-stages --jobs=4
This is the single most important step —
ANALYZE VERBOSEon individual tables is less effective than--analyze-in-stages.
4b. Force Extended Statistics Rebuild
-- If you have extended statistics SELECT 'ANALYZE ' || schemaname || '.' || tablename || ';' FROM pg_tables WHERE schemaname NOT IN ('pg_catalog', 'information_schema');
4c. Update Visibility Map (for Index-Only Scans)
# VACUUM is needed to rebuild the visibility map vacuumdb --all --jobs=4
PHASE 5: Post-Upgrade Validation
5a. Compare pg_stat_statements (Before vs After)
-- Check for queries with significantly higher mean_exec_time SELECT queryid, substr(query, 1, 100) AS query_preview, calls, mean_exec_time, total_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 30;
5b. Detect Plan Regressions Automatically
-- Find tables suddenly doing seq scans SELECT schemaname, relname, seq_scan, last_seq_scan, -- PG16+ idx_scan, last_idx_scan, -- PG16+ n_live_tup FROM pg_stat_user_tables WHERE seq_scan > 0 AND n_live_tup > 10000 ORDER BY seq_scan DESC;
5c. Identify Missing Indexes (Proactively)
-- Hypothetical missing indexes based on expensive query patterns SELECT t.schemaname, t.relname, t.seq_scan, t.seq_tup_read, t.n_live_tup, pg_size_pretty(pg_relation_size(t.relid)) AS table_size FROM pg_stat_user_tables t WHERE t.seq_scan > 100 AND t.n_live_tup > 50000 AND t.seq_tup_read > t.n_live_tup * 5 AND NOT EXISTS ( SELECT 1 FROM pg_stat_user_indexes i WHERE i.relid = t.relid AND i.idx_scan > 0 ) ORDER BY t.seq_tup_read DESC;
Consolidated Checklist for Today's Upgrade
PRE-UPGRADE (rds13xx)
═══════════════════════════════════════════════════════
☐ Snapshot pg_stat_statements output
☐ Save EXPLAIN ANALYZE for top 20 critical queries
☐ Document all existing indexes
☐ Identify large tables with high seq_scan counts
☐ Test upgrade on a cloned instance FIRST
☐ Compare query plans on clone vs production
☐ Create any missing indexes identified during clone testing
☐ Note current postgresql parameter values
POST-UPGRADE (Immediately)
═══════════════════════════════════════════════════════
☐ Run: vacuumdb --all --analyze-in-stages --jobs=4
☐ Run: vacuumdb --all --jobs=4
☐ SET jit = off (at database level)
☐ SET random_page_cost = 1.1 (if SSD/RDS)
☐ Verify random_page_cost, work_mem, effective_cache_size
☐ Compare pg_stat_statements (before vs after)
☐ Re-run EXPLAIN ANALYZE on critical queries
☐ Monitor for seq_scan regressions on large tables
☐ Check application response times / APM dashboards
ROLLBACK PLAN
═══════════════════════════════════════════════════════
☐ Keep pre-upgrade RDS snapshot
☐ If critical regressions found, restore from snapshot
☐ Document any queries needing new indexes for post-mortem
Key Takeaway
ANALYZE VERBOSE alone ≠ sufficient
The correct post-upgrade sequence is:
1. vacuumdb --all --analyze-in-stages --jobs=4 ← Statistics
2. vacuumdb --all --jobs=4 ← Visibility map
3. Disable JIT (jit=off) ← Common regression source
4. Tune random_page_cost for RDS/SSD ← Plan quality
5. Compare execution plans for top queries ← Validation
6. Proactively create missing indexes ← Prevention
Proactively identifying these on the clone before upgrading production is the single best prevention strategy.
Reference: https://www.postgresql.org/docs/17/pgupgrade.html