Header Banner Image
Your
Trusted
Get Fully AWS Funded
Cloud Migration

How to Read Aurora Performance Insights: A Developer’s Guide to Faster Queries

Amazon Aurora comes with two powerful built-in tools: Performance Insights (PI) and Database Insights (DI).

Most developers glance at them only when the database feels “slow,” but these dashboards contain query-level signals that can tell you why things are slow — and more importantly, what to fix.

This article walks through the most common PI metrics, explains what they mean, gives you “healthy ranges” for a typical product doing about 1000 queries per second (OLTP workload), and shows how to interpret them.


Why Developers Should Care About PI

Database optimization isn’t just a DBA job. Many performance issues come directly from application-level queries:

  • Missing indexes on frequently accessed tables

  • N+1 query patterns in APIs

  • Pulling too much data instead of paginating

  • Long transactions locking rows unnecessarily

Performance Insights highlights these issues in real time. If you know what to look for, you can catch and fix query inefficiencies long before they cause outages.


Enabling Performance Insights

If you’re provisioning Aurora with Terraform, PI is just a flag away.

👉 Using the terraform-aws-rds module, you can enable it by setting:

performance_insights_enabled = true

This ensures Aurora starts collecting PI data, so you can visualize load, latency, and query behavior directly in the AWS console.


1. Load & Latency

These metrics answer the basic question: Is the database keeping up?

Metric

Explanation

Healthy range (1000 r/s)

Interpretation

Points to if higher

Load by waits (AAS)

Avg active sessions (running or waiting)

~1–3 per vCPU

Balanced workload if AAS ≈ CPU count

CPU/IO saturation, locking

SQL statements

Top queries ranked by impact

Identifies expensive queries

Missing indexes, bad design

Calls/sec

Total DB calls/sec

~900–1200

Should match app RPS

N+1 issues, chatty app

Aae

Avg active executions

~1–2× vCPUs

Shows concurrent executions

CPU saturation, slow queries

Avg latency (ms)/call

Avg execution time

<5–10 ms

Indexed lookups should be fast

Missing indexes, IO stalls

👉 Guidance: If Calls/sec is far higher than your app’s RPS, you probably have inefficient API patterns (e.g., one request → 20 queries). If Avg latency creeps above 10ms, look at your top SQL list in PI to find slow queries.


2. Select Patterns

These metrics show how queries access your tables.

Metric

Explanation

Healthy range

Interpretation

Points to if higher

Select full join/sec

Joins without indexes

≈ 0

Almost never OK

Missing join indexes

Select range check/sec

Row-by-row range joins

≈ 0

Rare in healthy workloads

Composite indexes missing

Select scan/sec

Full table scans

<5/sec

Occasional scans fine

Unselective predicates

👉 Guidance:

  • A single Select scan on a large table can bring Aurora to its knees.

  • If you see full joins or range checks, fix them by adding indexes or adjusting join conditions.

  • Always check your EXPLAIN plan — if it says “ALL” scan, you’re in trouble.


3. Sorting

Sorting is one of the biggest hidden costs in SQL.

Metric

Explanation

Healthy range

Interpretation

Points to if higher

Sort merge passes/sec

Merge passes in sort

<10/sec

Extra passes = buffer too small

Increase sort_buffer_size

Sort scan/sec

Scans during sort

<20/sec

Sorting large sets

Bad ORDER/GROUP BY

Sort range/sec

Range scans in sort

<20/sec

Moderate is fine

Inefficient plans

Sort rows/sec

Rows sorted/sec

<50k–100k

Too many rows sorted

Missing ORDER/GROUP indexes

👉 Guidance: If your queries often sort, add covering indexes on the ORDER BY/GROUP BY columns. If that’s not possible, tune buffer sizes to avoid spills.


4. Rows & Temporary Tables

These metrics show how much data is being read, written, or staged in temporary tables.

Metric

Explanation

Healthy range

Interpretation

Points to if higher

Rows affected/sec

Rows modified/sec

Workload dependent

Matches app writes

Bulk ops, long txns

Rows examined/sec

Rows read/sec

≈ rows sent/sec

Examined ≫ sent = wasted work

Missing indexes

Rows sent/sec

Rows returned/sec

Matches app RPS

High values = heavy pulls

SELECT * , no pagination

Created tmp tables/sec

Temp tables in memory

Low–moderate

Common for joins

Complex queries

Created tmp disk tables/sec

Temp tables on disk

≈ 0–2/sec

Disk temps are expensive

Too small tmp_table_size, missing indexes

👉 Guidance:

  • Rows examined >> Rows sent means you’re scanning way more rows than you return — classic missing index problem.

  • Disk temp tables are one of the worst offenders. If they’re >2/sec, check if your joins or GROUP BY clauses can be optimized.


5. Locks

Metric

Explanation

Healthy range

Interpretation

Points to if higher

Lock time (ms)/sec

Time waiting for locks

≈ 0–50 ms/s

Small waits normal

Hot rows, long transactions

👉 Guidance: If lock waits increase, check for long-running transactions and unindexed foreign keys. A single missing index on a foreign key can lock an entire table.


6. Per-Call Metrics

The /sec metrics show volume, but the /call ones normalize per query — perfect for spotting wasteful queries.

Metric

Explanation

Healthy range

Interpretation

Points to if higher

Select full join/call

Full joins per query

≈ 0

Each query should avoid them

Add join indexes

Select scan/call

Full scans per query

≈ 0–0.01

Most queries shouldn’t scan

Missing indexes

Rows examined/call

Rows read per query

≈ rows sent

Close = efficient

Wasted reads

Rows sent/call

Rows returned/query

<10–100

Small payloads typical

Missing pagination

Created tmp disk tables/call

Disk temps per query

≈ 0

Costly per-query spill

Add indexes, raise tmp size

Lock time (ms)/call

Lock wait per query

≈ 0–2 ms

Higher = contention

Hot rows, poor indexing

👉 Guidance:

  • Always check Rows examined/call. If it’s hundreds or thousands but Rows sent/call is <10, you’re burning CPU scanning unnecessary rows.

  • If a single query creates a disk temp table, it’s worth fixing — either via indexes or query rewrite.


Putting It All Together

Here’s how you use PI in practice:

  1. Check Load by waits (AAS): is the DB stressed?

  2. Look at top SQL statements: which queries burn the most resources?

  3. Use /sec metrics to see which workloads dominate (scans, sorts, locks).

  4. Use /call metrics to check per-query efficiency.

  5. Fix at the application layer: add indexes, paginate, avoid N+1, shorten transactions.

  6. Retest in PI: confirm the red flags went down.


Example

❌ Bad query (scans entire table):

SELECT * FROM orders WHERE YEAR(created_at) = 2024;
  • Will trigger Select scan and high Rows examined/call.

✅ Fixed query (index-supported):

ALTER TABLE orders ADD INDEX idx_created_at (created_at);
SELECT * FROM orders WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31';
  • Uses range scan on index, reduces examined rows, lowers latency.


Key Takeaways
  • Aurora PI/DI isn’t just for DBAs — developers can and should use it.

  • Focus on joins, scans, rows examined, and disk temp tables — these are the real red flags.

  • A query that “works” but scans thousands of rows for 10 results is a hidden time bomb at scale.

  • Use /call metrics to measure efficiency, /sec metrics to measure load.

By checking these metrics regularly, you’ll not only keep your database healthy but also save your team from firefighting outages later.