Job Description Summary :
You own the stability, scalability, and performance of critical infrastructure running complex data pipelines at enterprise scale. This is a hands-on leadership role for a seasoned engineer who drives engineering rigor, automation, and reliability across global teams. You bring deep technical mastery in big data systems, automation, and coding, combined with proven success leading projects or teams. Your work prevents outages, accelerates deployments, and raises operational standards.
Responsibilities :
- Lead end-to-end reliability engineering for large-scale data ingestion and processing platforms.
- Architect, build, and automate infrastructure using Ansible, Terraform, Kubernetes, and OpenShift.
- Develop and enforce coding and testing standards, ensuring clean, maintainable, production-grade code in Java / Python.
- Drive CI / CD pipelines with Jenkins, Maven, Git, Docker — delivering frequent, stable releases.
- Mentor engineers, enforce accountability, and lift team maturity on reliability best practices.
- Use telemetry, monitoring, and metrics to proactively identify risk and prevent incidents.
- Collaborate closely with internal customers and global teams to solve complex problems quickly and effectively.
Requirements :
12+ years of hands-on systems, DevOps, or SRE experience in large, international enterprises.2+ years leading teams or complex projects with clear ownership of outcomes.Expert-level experience with Elastic Stack, Kafka, Logstash, and Kibana for data ingestion.Mastery of Infrastructure as Code tools : Ansible, Terraform, Kubernetes, OpenShift.Strong programming skills in Java or Python, with deep understanding of OOP and software engineering principles.Experience with microservices, REST / SOAP APIs, and NoSQL databases.Familiarity with ITIL processes and Agile delivery models.The ability to operate effectively across cultures and time zones, driving alignment and delivery.