Blog for DBA Consultants: Preventing SQL Performance Degradation During Major PostgreSQL Upgrades

When doing a major version upgrade of Postgres for example from v13 → v17, several things change that affect query plans:

┌─────────────────────────────────────────────────────────┐
│  Major Version Upgrade (pg_upgrade)                     │
│                                                         │
│  ✗ Optimizer statistics become stale/incompatible       │
│  ✗ Query planner heuristics change significantly        │
│  ✗ New default config params (e.g., JIT enabled)        │
│  ✗ Cost estimation model differences                    │
│  ✗ New plan types/strategies the planner may prefer     │
│  ✗ pg_statistic data may not reflect new planner needs  │
└─────────────────────────────────────────────────────────┘

ANALYZE VERBOSE alone is often NOT enough — it rebuilds statistics, but if the new planner chooses fundamentally different plans, you need a more comprehensive approach.

Pre-Upgrade Runbook

PHASE 1: Capture Baseline (Before Upgrade)

1a. Enable and snapshot pg_stat_statements

-- Ensure extension exists
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Capture top 50 most time-consuming queries
SELECT
    queryid,
    substr(query, 1, 100) AS query_preview,
    calls,
    mean_exec_time,
    total_exec_time,
    rows,
    shared_blks_hit,
    shared_blks_read
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 50;

Save this output — you'll compare it post-upgrade.

1b. Save Execution Plans for Critical Queries

-- For each critical query, save the plan
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
<your_critical_query>;

1c. Identify Large Tables with Sequential Scans (Missing Index Candidates)

-- Tables that might need indexes BEFORE upgrade
SELECT
    schemaname,
    relname,
    seq_scan,
    seq_tup_read,
    idx_scan,
    n_live_tup,
    ROUND((seq_tup_read::numeric / GREATEST(n_live_tup, 1)), 2) AS seq_scan_ratio
FROM pg_stat_user_tables
WHERE n_live_tup > 10000
  AND seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 30;

1d. Capture Current Index Inventory

SELECT
    schemaname,
    tablename,
    indexname,
    indexdef
FROM pg_indexes
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY tablename, indexname;

PHASE 2: Pre-Upgrade Testing (CRITICAL — Don't Skip)

2a. Clone & Test First

# For RDS: Create a snapshot and restore to a test instance
# Upgrade the TEST instance first, not production directly

aws rds restore-db-instance-from-db-snapshot \
    --db-instance-identifier rds13-upgrade-test \
    --db-snapshot-identifier <pre-upgrade-snapshot> \
    --db-instance-class <same-class>

# Then upgrade the test instance to target version
aws rds modify-db-instance \
    --db-instance-identifier rds13-upgrade-test \
    --engine-version <target-version> \
    --apply-immediately

2b. Run Critical Queries Against Upgraded Clone

-- Compare plans on the upgraded test instance
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
<your_critical_query>;

-- Look for:
--   Seq Scan where there was Index Scan before
--   Nested Loop where Hash Join was used before
--   Significant cost/row estimate differences

PHASE 3: Configuration Alignment

3a. Compare & Pin Critical Parameters

-- Check these on BOTH old and new versions
SHOW random_page_cost;        -- Often needs tuning for SSD/RDS
SHOW effective_cache_size;
SHOW work_mem;
SHOW default_statistics_target;
SHOW jit;                      -- ⚠️ NEW in PG12+, can cause regressions
SHOW plan_cache_mode;
SHOW enable_memoize;           -- NEW in PG14+, can cause regressions

3b. Recommended Parameter Adjustments Post-Upgrade

-- Disable JIT initially (known to cause regressions in OLTP workloads)
ALTER DATABASE your_db SET jit = off;

-- If on SSD/RDS (EBS gp3/io1), ensure this is set low
ALTER DATABASE your_db SET random_page_cost = 1.1;

-- Increase statistics target for critical columns
ALTER TABLE critical_table ALTER COLUMN critical_column 
    SET STATISTICS 1000;

-- Disable memoize if causing plan regressions (PG14+)
ALTER DATABASE your_db SET enable_memoize = off;

Post-Upgrade Runbook (Immediately After Upgrade)

PHASE 4: Statistics Rebuild (Comprehensive)

4a. Run Three-Stage Analyze (NOT just ANALYZE VERBOSE)

# This is what pg_upgrade recommends — three passes with increasing targets
# Stage 1: target=1 (fast, gets basic stats)
# Stage 2: target=10 (moderate)
# Stage 3: default target (full)

vacuumdb --all --analyze-in-stages --jobs=4

This is the single most important step — ANALYZE VERBOSE on individual tables is less effective than --analyze-in-stages.

4b. Force Extended Statistics Rebuild

-- If you have extended statistics
SELECT 'ANALYZE ' || schemaname || '.' || tablename || ';'
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema');

4c. Update Visibility Map (for Index-Only Scans)

# VACUUM is needed to rebuild the visibility map
vacuumdb --all --jobs=4

PHASE 5: Post-Upgrade Validation

5a. Compare pg_stat_statements (Before vs After)

-- Check for queries with significantly higher mean_exec_time
SELECT
    queryid,
    substr(query, 1, 100) AS query_preview,
    calls,
    mean_exec_time,
    total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 30;

5b. Detect Plan Regressions Automatically

-- Find tables suddenly doing seq scans
SELECT
    schemaname,
    relname,
    seq_scan,
    last_seq_scan,      -- PG16+
    idx_scan,
    last_idx_scan,      -- PG16+
    n_live_tup
FROM pg_stat_user_tables
WHERE seq_scan > 0
  AND n_live_tup > 10000
ORDER BY seq_scan DESC;

5c. Identify Missing Indexes (Proactively)

-- Hypothetical missing indexes based on expensive query patterns
SELECT
    t.schemaname,
    t.relname,
    t.seq_scan,
    t.seq_tup_read,
    t.n_live_tup,
    pg_size_pretty(pg_relation_size(t.relid)) AS table_size
FROM pg_stat_user_tables t
WHERE t.seq_scan > 100
  AND t.n_live_tup > 50000
  AND t.seq_tup_read > t.n_live_tup * 5
  AND NOT EXISTS (
      SELECT 1 FROM pg_stat_user_indexes i
      WHERE i.relid = t.relid AND i.idx_scan > 0
  )
ORDER BY t.seq_tup_read DESC;

Consolidated Checklist for Today's Upgrade

PRE-UPGRADE (rds13xx)
═══════════════════════════════════════════════════════
☐ Snapshot pg_stat_statements output
☐ Save EXPLAIN ANALYZE for top 20 critical queries
☐ Document all existing indexes
☐ Identify large tables with high seq_scan counts
☐ Test upgrade on a cloned instance FIRST
☐ Compare query plans on clone vs production
☐ Create any missing indexes identified during clone testing
☐ Note current postgresql parameter values

POST-UPGRADE (Immediately)
═══════════════════════════════════════════════════════
☐ Run: vacuumdb --all --analyze-in-stages --jobs=4
☐ Run: vacuumdb --all --jobs=4
☐ SET jit = off (at database level)
☐ SET random_page_cost = 1.1 (if SSD/RDS)
☐ Verify random_page_cost, work_mem, effective_cache_size
☐ Compare pg_stat_statements (before vs after)
☐ Re-run EXPLAIN ANALYZE on critical queries
☐ Monitor for seq_scan regressions on large tables
☐ Check application response times / APM dashboards

ROLLBACK PLAN
═══════════════════════════════════════════════════════
☐ Keep pre-upgrade RDS snapshot
☐ If critical regressions found, restore from snapshot
☐ Document any queries needing new indexes for post-mortem

Key Takeaway

ANALYZE VERBOSE alone ≠ sufficient

The correct post-upgrade sequence is:

  1. vacuumdb --all --analyze-in-stages --jobs=4   ← Statistics
  2. vacuumdb --all --jobs=4                        ← Visibility map
  3. Disable JIT (jit=off)                          ← Common regression source
  4. Tune random_page_cost for RDS/SSD              ← Plan quality
  5. Compare execution plans for top queries         ← Validation
  6. Proactively create missing indexes              ← Prevention

Proactively identifying these on the clone before upgrading production is the single best prevention strategy.

Reference: https://www.postgresql.org/docs/17/pgupgrade.html

Blog for DBA Consultants

Friday, March 13, 2026

Preventing SQL Performance Degradation During Major PostgreSQL Upgrades

Followers