Postgres Slow-Query Debugging: A Field Guide
The set of commands and patterns I reach for when a Postgres query has gone bad in production. Most of this is not in the official docs.
· postgres, reliability, backend
Postgres has the best documentation of any database I’ve used and yet, when something is on fire in production, the docs aren’t where I look first. This is the actual sequence of things I run.
Step 1: What is currently running?
SELECT pid, age(clock_timestamp(), query_start), state, query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY query_start ASC;
The longest-running query that isn’t idle is almost always the cause. If you see
something stuck in idle in transaction for more than a few minutes, that’s also a problem —
it’s probably holding a lock somewhere.
Step 2: Who’s waiting on whom?
SELECT
blocked.pid AS blocked_pid,
blocked.query AS blocked_query,
blocking.pid AS blocking_pid,
blocking.query AS blocking_query
FROM pg_stat_activity blocked
JOIN pg_stat_activity blocking ON blocking.pid = ANY(pg_blocking_pids(blocked.pid));
This is the lock graph. If you’ve got cascading slowness, this query tells you which transaction is at the root.
Step 3: pg_stat_statements
If pg_stat_statements is enabled (and it should be), this is the long-tail view:
SELECT query, calls, total_exec_time, mean_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
Look at total_exec_time, not mean_exec_time. A query that runs in 30ms but executes
10M times a day is a much bigger problem than a query that runs in 4s but happens once
an hour.
Step 4: EXPLAIN ANALYZE BUFFERS
When you’ve identified the bad query, EXPLAIN ANALYZE BUFFERS is the gold standard. The
key thing most people miss is BUFFERS — without it, you can’t tell if a slow seq scan is
slow because it’s CPU-bound or because it’s pulling data off disk.
EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT ...;
If shared read numbers are huge, you need an index. If shared hit is huge and the query
is still slow, you have a query plan problem.
Step 5: Indexes
The two things to check before adding a new index:
pg_stat_user_indexes— does an existing index serve this query and isn’t being used?- The plan you’re seeing now — would the index actually be used? Postgres’s planner makes cost-based decisions; an index isn’t free.
Half the time the answer is “fix the query to use the existing index.” The other half is “add the index but make it partial.” Full-table indexes on huge tables are a footgun.
This is the shape of a Postgres debugging session in production. Every step takes 30 seconds. The whole loop, when you’re in flow, runs in under five minutes. Most “slow Postgres” stories I read could have been resolved in that time.