Cloud Architecture

How to Read Aurora Performance Insights: A Developer’s Guide to Faster Queries

Amazon Aurora comes with two powerful built-in tools: Performance Insights (PI) and Database Insights (DI).

Most developers glance at them only when the database feels “slow,” but these dashboards contain query-level signals that can tell you why things are slow — and more importantly, what to fix.

This article walks through the most common PI metrics, explains what they mean, gives you “healthy ranges” for a typical product doing about 1000 queries per second (OLTP workload), and shows how to interpret them.

Why Developers Should Care About PI

Database optimization isn’t just a DBA job. Many performance issues come directly from application-level queries:

Missing indexes on frequently accessed tables
N+1 query patterns in APIs
Pulling too much data instead of paginating
Long transactions locking rows unnecessarily

Performance Insights highlights these issues in real time. If you know what to look for, you can catch and fix query inefficiencies long before they cause outages.

Enabling Performance Insights

If you’re provisioning Aurora with Terraform, PI is just a flag away.

👉 Using the terraform-aws-rds module, you can enable it by setting:

performance_insights_enabled = true

This ensures Aurora starts collecting PI data, so you can visualize load, latency, and query behavior directly in the AWS console.

1. Load & Latency

These metrics answer the basic question: Is the database keeping up?

Metric	Explanation	Healthy range (1000 r/s)	Interpretation	Points to if higher
Load by waits (AAS)	Avg active sessions (running or waiting)	~1–3 per vCPU	Balanced workload if AAS ≈ CPU count	CPU/IO saturation, locking
SQL statements	Top queries ranked by impact	—	Identifies expensive queries	Missing indexes, bad design
Calls/sec	Total DB calls/sec	~900–1200	Should match app RPS	N+1 issues, chatty app
Aae	Avg active executions	~1–2× vCPUs	Shows concurrent executions	CPU saturation, slow queries
Avg latency (ms)/call	Avg execution time	<5–10 ms	Indexed lookups should be fast	Missing indexes, IO stalls

👉 Guidance: If Calls/sec is far higher than your app’s RPS, you probably have inefficient API patterns (e.g., one request → 20 queries). If Avg latency creeps above 10ms, look at your top SQL list in PI to find slow queries.

2. Select Patterns

These metrics show how queries access your tables.

Metric	Explanation	Healthy range	Interpretation	Points to if higher
Select full join/sec	Joins without indexes	≈ 0	Almost never OK	Missing join indexes
Select range check/sec	Row-by-row range joins	≈ 0	Rare in healthy workloads	Composite indexes missing
Select scan/sec	Full table scans	<5/sec	Occasional scans fine	Unselective predicates

👉 Guidance:

A single Select scan on a large table can bring Aurora to its knees.
If you see full joins or range checks, fix them by adding indexes or adjusting join conditions.
Always check your EXPLAIN plan — if it says “ALL” scan, you’re in trouble.

3. Sorting

Sorting is one of the biggest hidden costs in SQL.

Metric	Explanation	Healthy range	Interpretation	Points to if higher
Sort merge passes/sec	Merge passes in sort	<10/sec	Extra passes = buffer too small	Increase `sort_buffer_size`
Sort scan/sec	Scans during sort	<20/sec	Sorting large sets	Bad ORDER/GROUP BY
Sort range/sec	Range scans in sort	<20/sec	Moderate is fine	Inefficient plans
Sort rows/sec	Rows sorted/sec	<50k–100k	Too many rows sorted	Missing ORDER/GROUP indexes

👉 Guidance: If your queries often sort, add covering indexes on the ORDER BY/GROUP BY columns. If that’s not possible, tune buffer sizes to avoid spills.

4. Rows & Temporary Tables

These metrics show how much data is being read, written, or staged in temporary tables.

Metric	Explanation	Healthy range	Interpretation	Points to if higher
Rows affected/sec	Rows modified/sec	Workload dependent	Matches app writes	Bulk ops, long txns
Rows examined/sec	Rows read/sec	≈ rows sent/sec	Examined ≫ sent = wasted work	Missing indexes
Rows sent/sec	Rows returned/sec	Matches app RPS	High values = heavy pulls	SELECT * , no pagination
Created tmp tables/sec	Temp tables in memory	Low–moderate	Common for joins	Complex queries
Created tmp disk tables/sec	Temp tables on disk	≈ 0–2/sec	Disk temps are expensive	Too small tmp_table_size, missing indexes

👉 Guidance:

Rows examined >> Rows sent means you’re scanning way more rows than you return — classic missing index problem.
Disk temp tables are one of the worst offenders. If they’re >2/sec, check if your joins or GROUP BY clauses can be optimized.

5. Locks

Metric	Explanation	Healthy range	Interpretation	Points to if higher
Lock time (ms)/sec	Time waiting for locks	≈ 0–50 ms/s	Small waits normal	Hot rows, long transactions

👉 Guidance: If lock waits increase, check for long-running transactions and unindexed foreign keys. A single missing index on a foreign key can lock an entire table.

6. Per-Call Metrics

The /sec metrics show volume, but the /call ones normalize per query — perfect for spotting wasteful queries.

Metric	Explanation	Healthy range	Interpretation	Points to if higher
Select full join/call	Full joins per query	≈ 0	Each query should avoid them	Add join indexes
Select scan/call	Full scans per query	≈ 0–0.01	Most queries shouldn’t scan	Missing indexes
Rows examined/call	Rows read per query	≈ rows sent	Close = efficient	Wasted reads
Rows sent/call	Rows returned/query	<10–100	Small payloads typical	Missing pagination
Created tmp disk tables/call	Disk temps per query	≈ 0	Costly per-query spill	Add indexes, raise tmp size
Lock time (ms)/call	Lock wait per query	≈ 0–2 ms	Higher = contention	Hot rows, poor indexing

👉 Guidance:

Always check Rows examined/call. If it’s hundreds or thousands but Rows sent/call is <10, you’re burning CPU scanning unnecessary rows.
If a single query creates a disk temp table, it’s worth fixing — either via indexes or query rewrite.

Putting It All Together

Here’s how you use PI in practice:

Check Load by waits (AAS): is the DB stressed?
Look at top SQL statements: which queries burn the most resources?
Use /sec metrics to see which workloads dominate (scans, sorts, locks).
Use /call metrics to check per-query efficiency.
Fix at the application layer: add indexes, paginate, avoid N+1, shorten transactions.
Retest in PI: confirm the red flags went down.

Example

❌ Bad query (scans entire table):

SELECT * FROM orders WHERE YEAR(created_at) = 2024;

Will trigger Select scan and high Rows examined/call.

✅ Fixed query (index-supported):

ALTER TABLE orders ADD INDEX idx_created_at (created_at);
SELECT * FROM orders WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31';

Uses range scan on index, reduces examined rows, lowers latency.

Key Takeaways

Aurora PI/DI isn’t just for DBAs — developers can and should use it.
Focus on joins, scans, rows examined, and disk temp tables — these are the real red flags.
A query that “works” but scans thousands of rows for 10 results is a hidden time bomb at scale.
Use /call metrics to measure efficiency, /sec metrics to measure load.

By checking these metrics regularly, you’ll not only keep your database healthy but also save your team from firefighting outages later.