Friday, March 13, 2026

Preventing SQL Performance Degradation During Major PostgreSQL Upgrades

When doing a major version upgrade of Postgres for example from v13 → v17, several things change that affect query plans:

┌─────────────────────────────────────────────────────────┐
│  Major Version Upgrade (pg_upgrade)                     │
│                                                         │
│  ✗ Optimizer statistics become stale/incompatible       │
│  ✗ Query planner heuristics change significantly        │
│  ✗ New default config params (e.g., JIT enabled)        │
│  ✗ Cost estimation model differences                    │
│  ✗ New plan types/strategies the planner may prefer     │
│  ✗ pg_statistic data may not reflect new planner needs  │
└─────────────────────────────────────────────────────────┘

ANALYZE VERBOSE alone is often NOT enough — it rebuilds statistics, but if the new planner chooses fundamentally different plans, you need a more comprehensive approach.


Pre-Upgrade Runbook

PHASE 1: Capture Baseline (Before Upgrade)

1a. Enable and snapshot pg_stat_statements

-- Ensure extension exists CREATE EXTENSION IF NOT EXISTS pg_stat_statements; -- Capture top 50 most time-consuming queries SELECT queryid, substr(query, 1, 100) AS query_preview, calls, mean_exec_time, total_exec_time, rows, shared_blks_hit, shared_blks_read FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 50;

Save this output — you'll compare it post-upgrade.

1b. Save Execution Plans for Critical Queries

-- For each critical query, save the plan EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) <your_critical_query>;

1c. Identify Large Tables with Sequential Scans (Missing Index Candidates)

-- Tables that might need indexes BEFORE upgrade SELECT schemaname, relname, seq_scan, seq_tup_read, idx_scan, n_live_tup, ROUND((seq_tup_read::numeric / GREATEST(n_live_tup, 1)), 2) AS seq_scan_ratio FROM pg_stat_user_tables WHERE n_live_tup > 10000 AND seq_scan > 0 ORDER BY seq_tup_read DESC LIMIT 30;

1d. Capture Current Index Inventory

SELECT schemaname, tablename, indexname, indexdef FROM pg_indexes WHERE schemaname NOT IN ('pg_catalog', 'information_schema') ORDER BY tablename, indexname;

PHASE 2: Pre-Upgrade Testing (CRITICAL — Don't Skip)

2a. Clone & Test First

# For RDS: Create a snapshot and restore to a test instance # Upgrade the TEST instance first, not production directly aws rds restore-db-instance-from-db-snapshot \ --db-instance-identifier rds13-upgrade-test \ --db-snapshot-identifier <pre-upgrade-snapshot> \ --db-instance-class <same-class> # Then upgrade the test instance to target version aws rds modify-db-instance \ --db-instance-identifier rds13-upgrade-test \ --engine-version <target-version> \ --apply-immediately

2b. Run Critical Queries Against Upgraded Clone

-- Compare plans on the upgraded test instance EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) <your_critical_query>; -- Look for: -- Seq Scan where there was Index Scan before -- Nested Loop where Hash Join was used before -- Significant cost/row estimate differences

PHASE 3: Configuration Alignment

3a. Compare & Pin Critical Parameters

-- Check these on BOTH old and new versions SHOW random_page_cost; -- Often needs tuning for SSD/RDS SHOW effective_cache_size; SHOW work_mem; SHOW default_statistics_target; SHOW jit; -- ⚠️ NEW in PG12+, can cause regressions SHOW plan_cache_mode; SHOW enable_memoize; -- NEW in PG14+, can cause regressions

3b. Recommended Parameter Adjustments Post-Upgrade

-- Disable JIT initially (known to cause regressions in OLTP workloads) ALTER DATABASE your_db SET jit = off; -- If on SSD/RDS (EBS gp3/io1), ensure this is set low ALTER DATABASE your_db SET random_page_cost = 1.1; -- Increase statistics target for critical columns ALTER TABLE critical_table ALTER COLUMN critical_column SET STATISTICS 1000; -- Disable memoize if causing plan regressions (PG14+) ALTER DATABASE your_db SET enable_memoize = off;

Post-Upgrade Runbook (Immediately After Upgrade)

PHASE 4: Statistics Rebuild (Comprehensive)

4a. Run Three-Stage Analyze (NOT just ANALYZE VERBOSE)

# This is what pg_upgrade recommends — three passes with increasing targets # Stage 1: target=1 (fast, gets basic stats) # Stage 2: target=10 (moderate) # Stage 3: default target (full) vacuumdb --all --analyze-in-stages --jobs=4

This is the single most important step — ANALYZE VERBOSE on individual tables is less effective than --analyze-in-stages.

4b. Force Extended Statistics Rebuild

-- If you have extended statistics SELECT 'ANALYZE ' || schemaname || '.' || tablename || ';' FROM pg_tables WHERE schemaname NOT IN ('pg_catalog', 'information_schema');

4c. Update Visibility Map (for Index-Only Scans)

# VACUUM is needed to rebuild the visibility map vacuumdb --all --jobs=4

PHASE 5: Post-Upgrade Validation

5a. Compare pg_stat_statements (Before vs After)

-- Check for queries with significantly higher mean_exec_time SELECT queryid, substr(query, 1, 100) AS query_preview, calls, mean_exec_time, total_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 30;

5b. Detect Plan Regressions Automatically

-- Find tables suddenly doing seq scans SELECT schemaname, relname, seq_scan, last_seq_scan, -- PG16+ idx_scan, last_idx_scan, -- PG16+ n_live_tup FROM pg_stat_user_tables WHERE seq_scan > 0 AND n_live_tup > 10000 ORDER BY seq_scan DESC;

5c. Identify Missing Indexes (Proactively)

-- Hypothetical missing indexes based on expensive query patterns SELECT t.schemaname, t.relname, t.seq_scan, t.seq_tup_read, t.n_live_tup, pg_size_pretty(pg_relation_size(t.relid)) AS table_size FROM pg_stat_user_tables t WHERE t.seq_scan > 100 AND t.n_live_tup > 50000 AND t.seq_tup_read > t.n_live_tup * 5 AND NOT EXISTS ( SELECT 1 FROM pg_stat_user_indexes i WHERE i.relid = t.relid AND i.idx_scan > 0 ) ORDER BY t.seq_tup_read DESC;

Consolidated Checklist for Today's Upgrade

PRE-UPGRADE (rds13xx)
═══════════════════════════════════════════════════════
☐ Snapshot pg_stat_statements output
☐ Save EXPLAIN ANALYZE for top 20 critical queries
☐ Document all existing indexes
☐ Identify large tables with high seq_scan counts
☐ Test upgrade on a cloned instance FIRST
☐ Compare query plans on clone vs production
☐ Create any missing indexes identified during clone testing
☐ Note current postgresql parameter values

POST-UPGRADE (Immediately)
═══════════════════════════════════════════════════════
☐ Run: vacuumdb --all --analyze-in-stages --jobs=4
☐ Run: vacuumdb --all --jobs=4
☐ SET jit = off (at database level)
☐ SET random_page_cost = 1.1 (if SSD/RDS)
☐ Verify random_page_cost, work_mem, effective_cache_size
☐ Compare pg_stat_statements (before vs after)
☐ Re-run EXPLAIN ANALYZE on critical queries
☐ Monitor for seq_scan regressions on large tables
☐ Check application response times / APM dashboards

ROLLBACK PLAN
═══════════════════════════════════════════════════════
☐ Keep pre-upgrade RDS snapshot
☐ If critical regressions found, restore from snapshot
☐ Document any queries needing new indexes for post-mortem

Key Takeaway

ANALYZE VERBOSE alone ≠ sufficient

The correct post-upgrade sequence is:

  1. vacuumdb --all --analyze-in-stages --jobs=4   ← Statistics
  2. vacuumdb --all --jobs=4                        ← Visibility map
  3. Disable JIT (jit=off)                          ← Common regression source
  4. Tune random_page_cost for RDS/SSD              ← Plan quality
  5. Compare execution plans for top queries         ← Validation
  6. Proactively create missing indexes              ← Prevention

Proactively identifying these on the clone before upgrading production is the single best prevention strategy.

Reference: https://www.postgresql.org/docs/17/pgupgrade.html