About the Role
We are seeking a
Site Reliability Engineer (SRE)
to join our client's team in Malaysia. You will be responsible for maintaining the
stability, scalability, and reliability
of critical applications and infrastructure, while driving automation and performance optimization.
This position requires Mandarin fluency, as you will collaborate with Mandarin-speaking stakeholders. The role is open exclusively to
Malaysian Citizens or Permanent Residents (PR).
It's a great opportunity for those with a strong background in Python (preferred) or Java / Golang with Linux scripting to advance their SRE career.
Key Responsibilities
system performance, reliability, and uptime
scalable, resilient system architectures
automation tools / scripts
to reduce manual work.
SLOs and SLIs
to measure system reliability.
databases, networks, and deployments (incl. Kubernetes)
incident post-mortems
and drive continuous improvement.
best practices
on-call rotations
and respond to critical issues.
What We're Looking For
Minimum
1.5 years of relevant experience
Fluent in
English & Mandarin (mandatory)
Strong skills in
Python
(preferred); if not,
Java or Golang + Linux scripting & Bash
Hands-on experience with
cloud platforms
(AWS, Azure, or GCP)
Proficiency in
Linux administration
and troubleshooting
Familiarity with
SRE concepts
(SLIs, SLOs, toil reduction, incident management)
Comfortable working on
rotational shifts
Nice to Have
Kubernetes, containers, CI / CD, Infrastructure as Code
monitoring tools
and performance optimization
Reliability Engineer • Malaysia