Design Failover Systems : Design and maintain scalable failover systems, backup strategies, and redundancy mechanisms across cloud and on-prem environments.
Develop DR Documentation : Create and update disaster recovery documentation, runbooks, and recovery playbooks for infrastructure and application layers.
Business Continuity Testing : Plan, coordinate, and execute tabletop exercises, DR drills, and failover simulations; analyze and report outcomes, identify gaps, and lead remediation initiatives.
Incident Response & Crisis Management : Develop incident response procedures, escalation paths, and communication frameworks for major outages; act as a key responder and facilitator during critical incidents to ensure swift coordination across teams.
Data Backup & Recovery Strategy : Implement and manage cloud-based and on-premise backup solutions aligned with defined RTO and RPO; regularly test and validate data restoration processes.
24 / 7 / 365 Coverage : Participate in a rotating on‑call schedule to ensure continuous coverage; operate within a 3‑shift structure with 9‑hour shifts and overlapping hour for smooth transitions.
Collaboration with Tier 1 and Tier 2 Support : Work closely with Tier 1 and Tier 2 teams as first point of contact for incidents and service requests; provide expertise and escalation support to ensure efficient resolution and seamless communication.
Qualifications
Bachelor’s degree in Computer Science, Engineering, or a related field.
3+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure roles.
Proven experience with DR planning, testing, and recovery operations.
Proficiency in AWS, focusing on services that support infrastructure and application layers.
Hands‑on experience with backup solutions such as Veeam, Rubrik, AWS Backup, and Azure Site Recovery.
Strong understanding of high availability, system redundancy, and incident management frameworks (ITIL, NIST).
Familiarity with monitoring and alerting tools (Prometheus, Grafana, Splunk, PagerDuty).
Strong spoken and written English communication skills.
Preferred Skills
Certifications in cloud platforms (e.g., AWS Solutions Architect, Azure Administrator).
Experience with chaos engineering or reliability testing tools (Gremlin, Chaos Monkey).
#J-18808-Ljbffr
Buat amaran kerja untuk carian ini
Site Reliability Engineer • Kuala Lumpur, Kuala Lumpur, Malaysia
Pekerjaan yang berkaitan
Dinaikkan pangkat
Senior Site Reliability Engineer (SRE)
Guidewire SoftwareKuala Lumpur, Kuala Lumpur, Malaysia
Senior Site Reliability Engineer (SRE) - Guidewire Cloud Platform (Application).We are seeking a Senior Site Reliability Engineer hungry for a rare chance to transform insurance with the industry's...Tunjukkan lagiKemas kini terakhir: 30+ hari yang lalu
Dinaikkan pangkat
Lead Site Reliability Engineer
SWIFTKuala Lumpur, Kuala Lumpur, Malaysia
We’re the world’s leading provider of secure financial messaging services, headquartered in Belgium.We are the way the world moves value – across borders, through cities and overseas.No other organ...Tunjukkan lagiKemas kini terakhir: 21 hari yang lalu
Dinaikkan pangkat
Site Reliability Engineer
HCL Singapore Pte LtdCyberjaya, Selangor, Malaysia
Administer and support VMware environments including VCF, VCD, NSX, ESXi, vCenter, vSAN, vRA / vRO, and Tanzu.Design, implement, and maintain automation scripts and tools to improve system reliabilit...Tunjukkan lagiKemas kini terakhir: 1 hari yang lalu
Dinaikkan pangkat
Site Reliability Engineer
AmpstekKuala Lumpur, Kuala Lumpur, Malaysia
Ampstek Federal Territory of Kuala Lumpur, Malaysia.We are looking for a skilled Site Reliability Engineer (SRE) to join our technology operations team.
The ideal candidate will be responsible for b...Tunjukkan lagiKemas kini terakhir: 14 hari yang lalu
Site Reliability Engineer
Unison GroupKuala Lumpur, Federal Territory of Kuala Lumpur, MY
Quick Apply
As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services.
Your expertise will help bridge the gap between development and op...Tunjukkan lagiKemas kini terakhir: 30+ hari yang lalu
Dinaikkan pangkat
Site Reliability Engineer
GX Bank BerhadPetaling Jaya, Selangor, Malaysia
Site Reliability Engineer page is loaded.Apply locations Petaling Jaya (First Avenue) time type Full time posted on Posted 9 Days Ago job requisition id R-.
GX Bank Berhad - the Grab-led Digital Ban...Tunjukkan lagiKemas kini terakhir: 30+ hari yang lalu
Dinaikkan pangkat
Site Reliability Engineer (SRE) / Devops Engineer
Unison Consulting Pte LtdKuala Lumpur, Kuala Lumpur, Malaysia
As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services.
Your expertise will help bridge the gap between development and op...Tunjukkan lagiKemas kini terakhir: 30+ hari yang lalu
Dinaikkan pangkat
Site Reliability Engineer
Russell TobinKuala Lumpur, Kuala Lumpur, Malaysia
Job Opportunity : Site Reliability Engineer (SRE) in Cyberjaya.Note : Only Malaysian locals or PR holders can apply.We are looking for a Site Reliability Engineer (SRE) to join our forward-thinking C...Tunjukkan lagiKemas kini terakhir: 19 hari yang lalu
Dinaikkan pangkat
Specialist, Site Reliability Engineer (SRE)
TNG DigitalKuala Lumpur, Kuala Lumpur, Malaysia
Specialist, Site Reliability Engineer (SRE).We are hiring for a Specialist, Site Reliability Engineer (SRE) to join our team.
Role focuses on network administration, cloud infrastructure management,...Tunjukkan lagiKemas kini terakhir: 15 hari yang lalu
Dinaikkan pangkat
Site Reliability Engineer (SRE)
FPT SoftwareKuala Lumpur, Kuala Lumpur, Malaysia
Design and maintain scalable failover systems, backup strategies, and redundancy mechanisms across cloud and on-prem environments.
Develop and update DR documentation, runbooks, and recovery playboo...Tunjukkan lagiKemas kini terakhir: 1 hari yang lalu
Dinaikkan pangkat
Site Reliability Engineer
Tata Consultancy ServicesKuala Lumpur, Kuala Lumpur, Malaysia
Talent Acquisition | Human Resource Executive | Tata Consultancy Service.Join Tata Consultancy Services, Asia Pacific and be part of an organization committed to sustainable development for our fut...Tunjukkan lagiKemas kini terakhir: 27 hari yang lalu
Dinaikkan pangkat
Site Reliability Engineer
Unison Consulting Pte LtdKuala Lumpur, Kuala Lumpur, Malaysia
As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services.
Your expertise will help bridge the gap between development and op...Tunjukkan lagiKemas kini terakhir: 30+ hari yang lalu
Dinaikkan pangkat
Site Reliability Engineer
Smart Teq Solution Sdn BhdKuala Lumpur, Kuala Lumpur, Malaysia
Ensure all our infrastructure are running at optimal condition.Provide deployment, patches and update on all services that running on public cloud and on premise.
Identify and resolve support ticket...Tunjukkan lagiKemas kini terakhir: 30+ hari yang lalu
Dinaikkan pangkat
Senior Site Reliability Engineer (SRE)
Ryt BankKuala Lumpur, Kuala Lumpur, Malaysia
Senior Talent Acquisition Specialist @ Ryt Bank.We are Ryt Bank, a joint venture between YTL and the SEA Group, proudly awarded as one of the five digital banking license winners by BNM in Malaysia...Tunjukkan lagiKemas kini terakhir: 30+ hari yang lalu
Site Reliability Engineer (SRE) / Devops Engineer
Unison GroupKuala Lumpur, Federal Territory of Kuala Lumpur, MY
Quick Apply
As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services.
Your expertise will help bridge the gap between development and op...Tunjukkan lagiKemas kini terakhir: 30+ hari yang lalu
Dinaikkan pangkat
Lead Site Reliability Engineer
SwiftKuala Lumpur, Kuala Lumpur, Malaysia
We’re the world’s leading provider of secure financial messaging services, headquartered in Belgium.We are the way the world moves value – across borders, through cities and overseas.No other organ...Tunjukkan lagiKemas kini terakhir: 7 hari yang lalu
Dinaikkan pangkat
Lead Site Reliability Engineer
Swift SoftwareKuala Lumpur, Kuala Lumpur, Malaysia
Lead Site Reliability Engineer page is loaded## Lead Site Reliability Engineerlocations : Kuala Lumpur, Malaysiatime type : Full timeposted on : Posted Todayjob requisition id : We’re the worl...Tunjukkan lagiKemas kini terakhir: 30+ hari yang lalu
Dinaikkan pangkat
Site Reliability Engineer
HCLTechSepang, Selangor, Malaysia
Administer and support VMware environments including VCF, VCD, NSX, ESXi, vCenter, vSAN, vRA / vRO, and Tanzu.Design, implement, and maintain automation scripts and tools to improve system reliabilit...Tunjukkan lagiKemas kini terakhir: 1 hari yang lalu