Amazon Aurora comes with two powerful built-in tools: Performance Insights (PI) and Database Insights (DI).
Most developers glance at them only when the database feels “slow,” but these dashboards contain query-level signals that can tell you why things are slow — and more importantly, what to fix.
This article walks through the most common PI metrics, explains what they mean, gives you “healthy ranges” for a typical product doing about 1000 queries per second (OLTP workload), and shows how to interpret them.
Database optimization isn’t just a DBA job. Many performance issues come directly from application-level queries:
Missing indexes on frequently accessed tables
N+1 query patterns in APIs
Pulling too much data instead of paginating
Long transactions locking rows unnecessarily
Performance Insights highlights these issues in real time. If you know what to look for, you can catch and fix query inefficiencies long before they cause outages.
If you’re provisioning Aurora with Terraform, PI is just a flag away.
👉 Using the terraform-aws-rds module, you can enable it by setting:
performance_insights_enabled = true
This ensures Aurora starts collecting PI data, so you can visualize load, latency, and query behavior directly in the AWS console.
These metrics answer the basic question: Is the database keeping up?
Metric | Explanation | Healthy range (1000 r/s) | Interpretation | Points to if higher |
Load by waits (AAS) | Avg active sessions (running or waiting) | ~1–3 per vCPU | Balanced workload if AAS ≈ CPU count | CPU/IO saturation, locking |
SQL statements | Top queries ranked by impact | — | Identifies expensive queries | Missing indexes, bad design |
Calls/sec | Total DB calls/sec | ~900–1200 | Should match app RPS | N+1 issues, chatty app |
Aae | Avg active executions | ~1–2× vCPUs | Shows concurrent executions | CPU saturation, slow queries |
Avg latency (ms)/call | Avg execution time | <5–10 ms | Indexed lookups should be fast | Missing indexes, IO stalls |
👉 Guidance: If Calls/sec is far higher than your app’s RPS, you probably have inefficient API patterns (e.g., one request → 20 queries). If Avg latency creeps above 10ms, look at your top SQL list in PI to find slow queries.
These metrics show how queries access your tables.
Metric | Explanation | Healthy range | Interpretation | Points to if higher |
Select full join/sec | Joins without indexes | ≈ 0 | Almost never OK | Missing join indexes |
Select range check/sec | Row-by-row range joins | ≈ 0 | Rare in healthy workloads | Composite indexes missing |
Select scan/sec | Full table scans | <5/sec | Occasional scans fine | Unselective predicates |
👉 Guidance:
A single
Select scanon a large table can bring Aurora to its knees.If you see full joins or range checks, fix them by adding indexes or adjusting join conditions.
Always check your EXPLAIN plan — if it says “ALL” scan, you’re in trouble.
Sorting is one of the biggest hidden costs in SQL.
Metric | Explanation | Healthy range | Interpretation | Points to if higher |
Sort merge passes/sec | Merge passes in sort | <10/sec | Extra passes = buffer too small | Increase |
Sort scan/sec | Scans during sort | <20/sec | Sorting large sets | Bad ORDER/GROUP BY |
Sort range/sec | Range scans in sort | <20/sec | Moderate is fine | Inefficient plans |
Sort rows/sec | Rows sorted/sec | <50k–100k | Too many rows sorted | Missing ORDER/GROUP indexes |
👉 Guidance: If your queries often sort, add covering indexes on the ORDER BY/GROUP BY columns. If that’s not possible, tune buffer sizes to avoid spills.
These metrics show how much data is being read, written, or staged in temporary tables.
Metric | Explanation | Healthy range | Interpretation | Points to if higher |
Rows affected/sec | Rows modified/sec | Workload dependent | Matches app writes | Bulk ops, long txns |
Rows examined/sec | Rows read/sec | ≈ rows sent/sec | Examined ≫ sent = wasted work | Missing indexes |
Rows sent/sec | Rows returned/sec | Matches app RPS | High values = heavy pulls | SELECT * , no pagination |
Created tmp tables/sec | Temp tables in memory | Low–moderate | Common for joins | Complex queries |
Created tmp disk tables/sec | Temp tables on disk | ≈ 0–2/sec | Disk temps are expensive | Too small tmp_table_size, missing indexes |
👉 Guidance:
Rows examined >> Rows sent means you’re scanning way more rows than you return — classic missing index problem.
Disk temp tables are one of the worst offenders. If they’re >2/sec, check if your joins or GROUP BY clauses can be optimized.
Metric | Explanation | Healthy range | Interpretation | Points to if higher |
Lock time (ms)/sec | Time waiting for locks | ≈ 0–50 ms/s | Small waits normal | Hot rows, long transactions |
👉 Guidance: If lock waits increase, check for long-running transactions and unindexed foreign keys. A single missing index on a foreign key can lock an entire table.
The /sec metrics show volume, but the /call ones normalize per query — perfect for spotting wasteful queries.
Metric | Explanation | Healthy range | Interpretation | Points to if higher |
Select full join/call | Full joins per query | ≈ 0 | Each query should avoid them | Add join indexes |
Select scan/call | Full scans per query | ≈ 0–0.01 | Most queries shouldn’t scan | Missing indexes |
Rows examined/call | Rows read per query | ≈ rows sent | Close = efficient | Wasted reads |
Rows sent/call | Rows returned/query | <10–100 | Small payloads typical | Missing pagination |
Created tmp disk tables/call | Disk temps per query | ≈ 0 | Costly per-query spill | Add indexes, raise tmp size |
Lock time (ms)/call | Lock wait per query | ≈ 0–2 ms | Higher = contention | Hot rows, poor indexing |
👉 Guidance:
Always check
Rows examined/call. If it’s hundreds or thousands butRows sent/callis <10, you’re burning CPU scanning unnecessary rows.If a single query creates a disk temp table, it’s worth fixing — either via indexes or query rewrite.
Here’s how you use PI in practice:
Check Load by waits (AAS): is the DB stressed?
Look at top SQL statements: which queries burn the most resources?
Use /sec metrics to see which workloads dominate (scans, sorts, locks).
Use /call metrics to check per-query efficiency.
Fix at the application layer: add indexes, paginate, avoid N+1, shorten transactions.
Retest in PI: confirm the red flags went down.
❌ Bad query (scans entire table):
SELECT * FROM orders WHERE YEAR(created_at) = 2024;
Will trigger
Select scanand highRows examined/call.
✅ Fixed query (index-supported):
ALTER TABLE orders ADD INDEX idx_created_at (created_at);
SELECT * FROM orders WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31';
Uses range scan on index, reduces examined rows, lowers latency.
Aurora PI/DI isn’t just for DBAs — developers can and should use it.
Focus on joins, scans, rows examined, and disk temp tables — these are the real red flags.
A query that “works” but scans thousands of rows for 10 results is a hidden time bomb at scale.
Use
/callmetrics to measure efficiency,/secmetrics to measure load.
By checking these metrics regularly, you’ll not only keep your database healthy but also save your team from firefighting outages later.