March/24 Highlights; External Health Checks for Clickhouse using Prometheus / Switching to Spot Instances & More
April 5, 2024
(DMVP-1192): Transitioning to spot instances to leverage cost savings in cloud computing resources.
(DMVP-3526): Review CloudWatch (metrics, logs, alarms) and delete unused resources
(DMVP-3534): Identifying and removing unused Elastic Block Store (EBS) volumes to free up resources and minimize expenses.
(DMVP-3678): Investigating cost-saving options provided by AWS to optimize spending on cloud services.
(DMVP-2987): Finalize Prefect Deployment
(DMVP-3601): Moving an older MySQL database to a new Azure-based setup
(DMVP-3602): Transferring ElasticSearch/OpenSearch databases from an AWS-based setup to a new Azure Kubernetes environment
(DMVP-3603): Moving data from MongoDB to OpenSearch as part of a migration from an old OTC-Nomad setup to Azure AKS
(DMVP-3652): Upgrading the MongoDB cluster
(DMVP-3671): Integrating MongoDB Atlas with monitoring tools
(DMVP-3680): Updating the version of the Relational Database Service (RDS)
(DMVP-3547): Fine-tuning alarms to better monitor latency and traffic, improving system health monitoring.
(DMVP-3595): Implementing external health checks to monitor the system's status and ensure its reliability.
(DMVP-3642): Using Prometheus to monitor ClickHouse metrics, providing detailed insights into database performance and health.
(DMVP-3649): Troubleshooting and fixing the external health checks and monitoring systems to ensure they function correctly.
(DMVP-3653): Examining the process for manually uploading training data to identify and resolve potential issues, contributing to overall system health.
(DMVP-3708): Establishing alerts for ClickHouse read-only mode incidents to proactively manage and resolve database health issues.
(DMVP-3712): Communicating with BI/Data Science team to gain insights into the setup, aiding in better system health and performance monitoring.
(DMVP-3774): Regular infrastructure support activities to maintain system health and address any arising issues promptly.
(DMVP-3668): Deploying Kubernetes Certificate Authority to production EKS, enhancing deployment security and configuration.
(DMVP-3655): Centralizing the Docker container registry for more streamlined and efficient deployment processes.
(DMVP-3656): Transitioning from Gitlab to Kaniko for container builds, optimizing the deployment pipeline.
(DMVP-3705): Implementing Horizontal Pod Autoscaling for workers to improve resource management during deployments.
(DMVP-3704): Deploying Clickhouse in a new cloud account, part of setting up and optimizing new environments.
(DMVP-3797): Deploying new services, demonstrating the ongoing efforts to enhance deployment practices and configuration management.
(DMVP-3639): Renewing and managing licenses for open-source projects to ensure compliance.
(DMVP-3650): Discussing and implementing improvements in MongoDB indexes to enhance security and performance.
(DMVP-3652): Upgrading MongoDB clusters for better security and efficiency.