September/24 Highlights; Cost Monitoring, Kubernetes Security, and Pipeline Optimization: Strengthening Cloud Infrastructure for Peak Performance

October 16, 2024

In September, Das Meta team continued to push the boundaries of cloud infrastructure management, completing a total of 163 tasks. We focused on Kubernetes management, resource optimization, advanced monitoring setups, alarm configurations, and security enhancements. Every task brought us closer to ensuring that our clients’ infrastructures are more reliable, scalable, and cost-effective.


Let's dive deeper in the aspects covered in September.

Cost Management and Monitoring Setups

A major focus this month was on cost management and monitoring improvements. We implemented several alarms to track cost increases and service latencies, ensuring immediate alerts in case of anomalies. These setups enabled our clients to have better visibility and control over their operational expenses.

  • DMVP-2949: Setup alarm on Azure to receive notification if costs expect to go up more than expected.
  • DMVP-4737: Setup alarm on latency.
  • DMVP-5016: Setup alarms on queue contents in Grafana.
  • DMVP-5017: Setup alarms on latency in Grafana.
  • DMVP-5226: Research why cost got increased.

Kubernetes Management and Security

We took proactive steps to improve the security and management of Kubernetes environments, addressing pipeline issues, configuring secret management tools, and enhancing CI/CD processes. Our focus on improving security measures has significantly reduced the risk of vulnerabilities in both dev and production environments.


  • DMVP-4651: Move API dev to GitLab/eks/secret manager.
  • DMVP-5110: Check Kubernetes Ingress-Nginx Vulnerability.
  • DMVP-5239: Check prod & dev load balancers.
  • DMVP-5163: Enable security features across environments.

Alarm and Dashboard Integration with DataDog

To enhance monitoring capabilities, we integrated multiple services with DataDog, improving our clients' ability to monitor metrics and logs. This included setting up APM tracing for services like Celery, and establishing Grafana dashboards for clearer insights into service performance.


  • DMVP-4932: Push steam node (ECS) APM/Traces to DataDog.
  • DMVP-4943: Setup alarm on 10s latency.
  • DMVP-4984: Monitor Weaviate resources (memory).
  • DMVP-5396: Link current DataDog alarms with opsGenie & Slack.



Advanced Support for Key Services

Our advanced support team worked on resolving complex service issues and optimizing critical infrastructure components. We investigated and addressed challenges with recommendation engines, performance testing, and resource management, all while ensuring that services like Redis and MySQL operated smoothly.

  • DMVP-5152: Investigate recommendation engine issues.
  • DMVP-5337: Load test.
  • DMVP-5153: Investigate price-live issues.
  • DMVP-5407: Understand 500s when ProxySQL version is on.
  • DMVP-5434: Check Redis settings.



Optimizing CI/CD Pipelines and Infrastructure Efficiency

Efficiency was a key priority in September as we worked to reduce costs and optimize pipeline performance. This included reducing CI/CD pipeline instance sizes, stabilizing environments, and ensuring smoother deployments for a variety of services across dev and production environments.

  • DMVP-5168: Reduce CI/CD pipeline instance types to smaller to reduce costs.
  • DMVP-5326: Check pipeline fail.
  • DMVP-4980: Setup exporter for Redis.
  • DMVP-5252: Configure HPA on website.


September was a month of significant improvements for Das Meta, as we continued to refine our clients' cloud infrastructures, making them more robust, scalable, and secure. Stay tuned for the upcoming innovations in October!

Share by: