Ampstek Federal Territory of Kuala Lumpur, Malaysia
Site Reliability Engineer
Position Summary : We are looking for a skilled Site Reliability Engineer (SRE) to join our technology operations team. The ideal candidate will be responsible for building scalable, reliable, and high-performance systems while ensuring continuous uptime and operational excellence. The SRE will work closely with development, DevOps, and infrastructure teams to automate processes, enhance observability, and improve system resilience.
Key Responsibilities
- Design, build, and maintain highly available and scalable infrastructure across cloud and on-premise environments.
- Implement monitoring, alerting, and incident response systems using tools such as Prometheus, Grafana, ELK, or Splunk.
- Automate deployment, scaling, and operations using Infrastructure-as-Code (IaC) tools like Terraform, Ansible, or CloudFormation.
- Drive CI / CD pipeline enhancements and ensure seamless integration and deployment workflows (e.g., Jenkins, GitLab CI, or Azure DevOps).
- Collaborate with development teams to improve system reliability, observability, and performance.
- Troubleshoot production issues, perform root cause analysis (RCA), and implement long-term fixes.
- Manage incident response and postmortems, reducing Mean Time To Recovery (MTTR).
- Work with Kubernetes / Docker environments to support microservices and containerized deployments.
- Ensure robust disaster recovery and backup strategies, along with adherence to security and compliance requirements.
Must-Have Skills
Strong experience as an SRE, DevOps Engineer, or Cloud Infrastructure Engineer in large-scale production environments.Proficiency in Linux / Unix system administration and shell scripting.Hands-on experience with cloud platforms (AWS, Azure, or GCP).Expertise in containerization and orchestration tools such as Docker and KubernetesExperience with CI / CD tools (Jenkins, GitLab CI, or Azure DevOps).Knowledge of Infrastructure-as-Code tools (Terraform, Ansible, or CloudFormation).Familiarity with monitoring and observability tools (Prometheus, Grafana, ELK Stack, Splunk, Datadog, or New Relic).Experience in automating repetitive tasks using Python, Bash, or GoSeniority level
Mid-Senior levelEmployment type
ContractJob function
Information TechnologyIndustries
IT Services and IT ConsultingReferrals increase your chances of interviewing at Ampstek by 2x
#J-18808-Ljbffr