Cloud Architecture

VictoriaMetrics Architecture for Prometheus-Based Monitoring

Overview

This document describes an architectural approach for adding VictoriaMetrics (VM) to an existing Kubernetes‑based Prometheus monitoring stack.

The intent is simple:

Keep all the benefits and ecosystem of Prometheus.
Fix slow and heavy queries (especially histograms and long ranges).
Gain predictable scaling and long‑term retention.
Avoid a big migration or redesign.

VM is added as a storage and query backend behind Prometheus. Prometheus keeps doing what it is good at (scraping and alerting), while VM takes over the heavy lifting for storing and querying large amounts of data.

Why This Architecture

1. Full Prometheus Ecosystem Compatibility

Prometheus remains the primary scraper and alerting engine. This means:

ServiceMonitor and PodMonitor continue to work.
PrometheusRule objects are still used for alerting and recording rules.
kube‑prometheus‑stack and Prometheus Operator stay in place.
Exporters and open‑source services that “speak Prometheus” remain fully compatible.

In practice, teams continue using the same CRDs, the same Helm charts, and the same workflows. VM is introduced behind the scenes and does not force a change in day‑to‑day operations.

2. Fixes Slow Queries (Especially histogram_quantile)

As data grows, Prometheus local storage becomes the bottleneck. Long‑range graphs (several hours or days) and histogram‑based queries like histogram_quantile can feel slow or unreliable.

VictoriaMetrics is designed as a high‑performance time‑series database:

Optimised storage format for time‑series and histograms.
Efficient work with high‑cardinality metrics.
Stable performance even when the dataset grows.

From a user perspective, heavy dashboards load faster and are more reliable, without needing to redesign metrics or dashboards.

3. Horizontal Scalability Instead of Bigger Prometheus

Prometheus scales mainly by giving it a bigger node. At some size, this stops being efficient or manageable.

VM Cluster provides horizontal scaling:

Scale reads by adding vmselect instances.
Scale writes by adding vminsert instances.
Scale storage by adding vmstorage nodes.

This lets you grow storage and performance step by step instead of constantly resizing a single Prometheus instance.

4. Minimal Operational Change

The key design decision is: no change to scraping and alerting.

Prometheus stays as it is:

Same scrape configuration.
Same rules and alerts.
Same integration pattern for new services.

VM is introduced as an additional component. Grafana is pointed to VM for most dashboards. From the platform and product teams’ point of view, the monitoring “feels” the same, just faster and more responsive.

When This Architecture Makes Sense

This design is a good fit when one or more of the following are true:

Query performance is becoming an issue (slow, timing out, or spiky).
You rely on histograms and quantiles across several hours or days.
The number of services, environments, or tenants is growing.
Prometheus instances are becoming large and resource‑hungry.
You want longer retention (weeks/months) without hurting Prometheus.
You heavily depend on Prometheus Operator, ServiceMonitors, and existing open‑source charts.

In short: you want the Prometheus ecosystem, but with a more capable backend.

When It Is Probably Overkill

This architecture may not be the right choice if:

You have a very small environment with limited metrics and short retention.
Prometheus queries are consistently fast and resource usage is low.
You already run a different central TSDB (e.g. Mimir, Thanos) and are happy with it.
The team strongly prefers to avoid operating any additional component.

In those cases, keeping a single Prometheus instance (or a small Prometheus setup) might be enough.

Architecture Description

At a high level, the architecture extends Prometheus with a dedicated storage and query layer provided by VictoriaMetrics.

Instead of Prometheus being responsible for both scraping and long‑term storage, these concerns are split:

Prometheus: scraping + alerting + short local retention.
VictoriaMetrics: long‑term storage + heavy queries.

Components and Roles

Prometheus
- Scrapes all Kubernetes and application metrics.
- Evaluates alerting and recording rules.
- Pushes a copy of all metrics to VM via remote write.
- Remains the “source of truth” for Prometheus semantics and integrations.
vminsert
- Acts as the write entrypoint for VM.
- Receives remote_write traffic from Prometheus.
- Distributes metrics across vmstorage nodes.
- Can be scaled horizontally when write volume increases.
vmstorage (clustered)
- Stores the actual time‑series data.
- Responsible for durability and efficient long‑term retention.
- Scales horizontally by adding more storage nodes.
vmselect
- Serves all read and query requests.
- Talks to all vmstorage nodes in parallel and aggregates results.
- Can be scaled horizontally when query load increases.
Grafana (and other consumers)
- Uses VM (vmselect) as the main datasource for dashboards.
- Optionally keeps Prometheus as a secondary datasource for specific use cases.

This separation of responsibilities makes the system easier to evolve: each layer can be scaled independently as load patterns change.

How It Is Introduced (High-Level Steps)

The goal of the rollout is to keep risk low and avoid disruption for developers and operators.

Deploy VictoriaMetrics Cluster
- A VM cluster (vminsert, vmstorage, vmselect) is deployed into Kubernetes.
- Initial sizing is chosen to match current Prometheus usage, with room for growth.
Enable Remote Write from Prometheus to VM
- Prometheus is configured to send a copy of all metrics to vminsert.
- No changes are made to scrape configs, rules, or existing targets.
- During this phase, Prometheus and VM both hold a copy of the data.
Point Grafana to VM
- Grafana is configured to use VM as a datasource.
- Existing dashboards can be gradually switched to VM.
- Users should start seeing faster loading times on heavy panels.
Adjust Prometheus Retention (If Needed)
- Once VM is trusted, Prometheus retention can be reduced.
- This keeps Prometheus lightweight and focused on scraping and alerting.
Iterate and Scale the VM Cluster
- Monitor resource usage and query performance.
- Add more vmstorage, vmselect, or vminsert instances as the environment grows.
- Optionally introduce recording rules optimised for VM to further speed up specific use cases.

Throughout this process, the existing monitoring and alerting setup continues to work. Teams can switch to using VM‑backed dashboards when they are ready, without a “big bang” cutover.

Advantages Summary

From a platform and product perspective, the architecture offers:

Compatibility: No need to rework existing Prometheus integrations or charts.
Performance: Faster dashboards, especially for histograms and long time ranges.
Scalability: A clear path to scale storage and query capacity as the business grows.
Stability: Less pressure on Prometheus itself, reducing the risk of overload.
Flexibility: Easy to tune retention policies and resources independently for Prometheus and VM.
Low Risk: Prometheus stays in place, making the change reversible if needed.

Example Performance and Cost Expectations

To give a practical sense of impact, the numbers below illustrate typical performance improvements and operational costs seen when moving from single‑node Prometheus storage to VictoriaMetrics.

Query Performance Improvements

These are realistic ranges observed in medium to large Kubernetes setups:

Histogram/quantile queries (6h window): typically 5×–20× faster.
Long‑range dashboards (24h–7d): 10×–30× faster.
Heavy cardinality metrics: 3×–10× reduction in query latency.
Dashboard cold‑start load times: from 5–15s → 0.5–2s.

Storage Efficiency

VM’s storage engine is significantly more efficient:

2×–4× lower disk usage for the same data.
Fewer SSD IOPS required because of sequential/compressed blocks.
Retention cost decreases proportionally, making 30–180 days feasible.

Example Cost Scenarios

Below are approximate monthly infra costs based on typical cloud VM sizes (AWS/GCP/Hetzner averages):

Small Environment (30k–60k active series)

Prometheus alone: requires large instance → ~$120–$180/mo.
VM single-node:
- 4 vCPU / 8–16GB RAM → ~$40–$80/mo.
- Query performance increases ~5×–10×.
- Storage requirement drops ~2×.

Medium Environment (100k–250k active series)

Prometheus alone becomes unstable unless heavily tuned.
VM cluster recommended:
- 2× vminsert (small): ~$20–$30/mo each
- 3× vmstorage (medium): ~$50–$80/mo each
- 2× vmselect (medium): ~$40–$60/mo each
- Total: ~$250–$380/mo
- Query performance improves 10×–25×.
- Long-term retention (90–180 days) stays performant.

Large Environment (300k–700k active series)

Prometheus alone: This range becomes nearly impossible.
VM cluster:
- 3× vminsert: ~$30–$50/mo each
- 5× vmstorage: ~$70–$120/mo each
- 3× vmselect: ~$50–$80/mo each
- Total: ~$600–$900/mo
- Predictable performance even at very high cardinality.
- Query performance improves 15×–30×.

Summary of Cost vs Benefit

For most environments, VM reduces infrastructure cost for the same retention.
Performance boost is immediate and significant.
Scaling is linear and predictable, unlike Prometheus.

Recommended Topology (Non-Technical)

A practical, production‑oriented starting point:

Prometheus: unchanged; continues scraping and alerting.
vminsert: 1–2 instances for write handling and redundancy.
vmstorage: at least 3 nodes for durable, scalable storage.
vmselect: 2 or more instances for high‑availability querying.
Grafana: configured to read from VM for most dashboards.

This setup balances simplicity, high availability, and room to grow.

Conclusion

Adding VictoriaMetrics as a backend to Prometheus is a pragmatic way to take the next step in observability without throwing away existing work.

Teams keep:

The familiar Prometheus model.
Their existing CRDs, rules, and exporters.
Current operational practices.

At the same time, they gain:

Faster and more reliable queries.
Long‑term metric history.
A scalable foundation for future growth.

Overall, this architecture is a low‑friction, high‑impact improvement for Kubernetes‑based environments that have outgrown a single Prometheus instance but still want to stay fully aligned with the Prometheus ecosystem.