Cloud Architecture | DasMeta

Observability Key Concepts

1. Signal Types

Telemetry forms that expose system behaviour.

Metrics: Numeric time-series showing performance trends.
Logs: Structured events describing what happened.
Traces: End-to-end request paths across components.
Profiles: Continuous CPU/memory/runtime behaviour.

2. Signal Metadata & Structure

How telemetry is shaped, enriched, and made analyzable.

Labels / Attributes: Dimensions for slicing signals.
Cardinality: Unique label combinations; cost/performance impact.
Histograms: Bucketed distributions enabling percentiles.
Exemplars: Metric points linked to specific traces.
Sampling: Reducing volume while preserving value.
Enrichment: Adding context (tenant, region, feature flag).

3. Instrumentation

How telemetry is generated.

White-box instrumentation: Signals emitted from inside code.
Black-box checks: External probes validating availability.
eBPF: Kernel-level insights without code changes.
Agents / Exporters / SDKs: Components collecting and forwarding telemetry.

4. Telemetry Storage & Processing

Where telemetry lives and how it is aggregated.

Time-series databases: Store metrics.
Log stores: Index/search logs.
Trace stores: Persist distributed traces.
Retention strategies: Hot/warm/cold data lifecycle.
Aggregations / transformations: Derived metrics, volume reduction.
Pipelines: Stream/batch telemetry processing.

5. Correlation & Analysis

How signals are connected to explain behaviour.

Correlation: Link metrics, logs, traces, deploy events.
Service graphs: Auto-discovered dependency maps.
Anomaly detection: Deviation from baseline.
Drift detection: Missing or degraded telemetry.
Noise reduction: Remove low-value signals.

6. Visualization & Exploration

How humans inspect telemetry.

Dashboards: Visual summaries.
Explorers / Queries: Ad-hoc inspection of all signal types.
Heatmaps: Pattern and distribution visualization.
Service maps: Visual dependency and flow mapping.

7. Alerting & Automated Interpretation

How telemetry triggers action.

Alerts: Automated notifications.
Threshold-based alerting: Static/dynamic limits.
Statistical alerting: Baselines, anomalies.
Multi-signal alerting: Combine metrics/logs/traces.

8. Telemetry Governance

Ensuring telemetry is consistent and cost-efficient.

Schemas: Structure and naming standards.
Label budgets: Cardinality control.
Cost governance: Ingestion/storage/retention management.
Conventions: Standard rules for metrics/logs/traces.