Description and Requirements
We are seeking a skilled Cloud Observability Engineer to design and implement comprehensive monitoring and observability solutions for our cloud infrastructure. This role is responsible for building scalable monitoring systems that provide real-time visibility into system health and performance across Linux, OpenStack, and Kubernetes environments.
Key Responsibilities :
1. Monitoring System Design & Implementation :
Design and deploy end-to-end monitoring solutions for cloud infrastructure and core services.
Implement monitoring pipelines using Prometheus for metrics collection and Zabbix for alerting.
Ensure comprehensive coverage of system health and performance metrics.
2. Architecture Optimization & Troubleshooting :
Optimize monitoring architectures for scalability and low-latency data processing.
Troubleshoot complex monitoring issues including metric collection failures and performance bottlenecks.
Implement high-availability monitoring solutions with efficient resource utilization.
3. Automation & Tool Development :
Develop automation tools for monitoring workflows using Golang and Python.
Create dynamic alert generation systems and anomaly detection capabilities.
Integrate monitoring solutions with CI / CD pipelines and cloud-native workflows.
4. Collaboration & Integration :
Work closely with infrastructure and DevOps teams to align observability strategies with product requirements.
Integrate monitoring systems with AI-driven analytics and cloud platform services.
Provide monitoring insights to support performance optimization and capacity planning.
Qualifications :
Additional Locations :
Developer • Petaling Jaya, Selangor, Malaysia