Talent.com
Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

Refine GroupKuala Lumpur, Kuala Lumpur, Malaysia
13 hours ago
Job description

Responsibilities

  • Design Failover Systems : Design and maintain scalable failover systems, backup strategies, and redundancy mechanisms across cloud and on-prem environments.
  • Develop DR Documentation : Create and update disaster recovery documentation, runbooks, and recovery playbooks for infrastructure and application layers.
  • Business Continuity Testing : Plan, coordinate, and execute tabletop exercises, DR drills, and failover simulations; analyze and report outcomes, identify gaps, and lead remediation initiatives.
  • Incident Response & Crisis Management : Develop incident response procedures, escalation paths, and communication frameworks for major outages; act as a key responder and facilitator during critical incidents to ensure swift coordination across teams.
  • Data Backup & Recovery Strategy : Implement and manage cloud-based and on-premise backup solutions aligned with defined RTO and RPO; regularly test and validate data restoration processes.
  • 24 / 7 / 365 Coverage : Participate in a rotating on‑call schedule to ensure continuous coverage; operate within a 3‑shift structure with 9‑hour shifts and overlapping hour for smooth transitions.
  • Collaboration with Tier 1 and Tier 2 Support : Work closely with Tier 1 and Tier 2 teams as first point of contact for incidents and service requests; provide expertise and escalation support to ensure efficient resolution and seamless communication.

Qualifications

  • Bachelor’s degree in Computer Science, Engineering, or a related field.
  • 3+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure roles.
  • Proven experience with DR planning, testing, and recovery operations.
  • Proficiency in AWS, focusing on services that support infrastructure and application layers.
  • Hands‑on experience with backup solutions such as Veeam, Rubrik, AWS Backup, and Azure Site Recovery.
  • Strong understanding of high availability, system redundancy, and incident management frameworks (ITIL, NIST).
  • Familiarity with monitoring and alerting tools (Prometheus, Grafana, Splunk, PagerDuty).
  • Strong spoken and written English communication skills.
  • Preferred Skills

  • Certifications in cloud platforms (e.g., AWS Solutions Architect, Azure Administrator).
  • Experience with chaos engineering or reliability testing tools (Gremlin, Chaos Monkey).
  • #J-18808-Ljbffr

    Create a job alert for this search

    Site Reliability Engineer • Kuala Lumpur, Kuala Lumpur, Malaysia

    Related jobs
    • Promoted
    Senior Site Reliability Engineer (SRE)

    Senior Site Reliability Engineer (SRE)

    Guidewire SoftwareKuala Lumpur, Kuala Lumpur, Malaysia
    Senior Site Reliability Engineer (SRE) - Guidewire Cloud Platform (Application).We are seeking a Senior Site Reliability Engineer hungry for a rare chance to transform insurance with the industry's...Show moreLast updated: 30+ days ago
    • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    SWIFTKuala Lumpur, Kuala Lumpur, Malaysia
    We’re the world’s leading provider of secure financial messaging services, headquartered in Belgium.We are the way the world moves value – across borders, through cities and overseas.No other organ...Show moreLast updated: 20 days ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    HCL Singapore Pte LtdCyberjaya, Selangor, Malaysia
    Administer and support VMware environments including VCF, VCD, NSX, ESXi, vCenter, vSAN, vRA / vRO, and Tanzu.Design, implement, and maintain automation scripts and tools to improve system reliabilit...Show moreLast updated: 13 hours ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    AmpstekKuala Lumpur, Kuala Lumpur, Malaysia
    Ampstek Federal Territory of Kuala Lumpur, Malaysia.We are looking for a skilled Site Reliability Engineer (SRE) to join our technology operations team. The ideal candidate will be responsible for b...Show moreLast updated: 13 days ago
    Site Reliability Engineer

    Site Reliability Engineer

    Unison GroupKuala Lumpur, Federal Territory of Kuala Lumpur, MY
    Quick Apply
    As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services. Your expertise will help bridge the gap between development and op...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    GX Bank BerhadPetaling Jaya, Selangor, Malaysia
    Site Reliability Engineer page is loaded.Apply locations Petaling Jaya (First Avenue) time type Full time posted on Posted 9 Days Ago job requisition id R-. GX Bank Berhad - the Grab-led Digital Ban...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer (SRE) / Devops Engineer

    Site Reliability Engineer (SRE) / Devops Engineer

    Unison Consulting Pte LtdKuala Lumpur, Kuala Lumpur, Malaysia
    As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services. Your expertise will help bridge the gap between development and op...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Russell TobinKuala Lumpur, Kuala Lumpur, Malaysia
    Job Opportunity : Site Reliability Engineer (SRE) in Cyberjaya.Note : Only Malaysian locals or PR holders can apply.We are looking for a Site Reliability Engineer (SRE) to join our forward-thinking C...Show moreLast updated: 18 days ago
    • Promoted
    Specialist, Site Reliability Engineer (SRE)

    Specialist, Site Reliability Engineer (SRE)

    TNG DigitalKuala Lumpur, Kuala Lumpur, Malaysia
    Specialist, Site Reliability Engineer (SRE).We are hiring for a Specialist, Site Reliability Engineer (SRE) to join our team. Role focuses on network administration, cloud infrastructure management,...Show moreLast updated: 14 days ago
    • Promoted
    • New!
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    FPT SoftwareKuala Lumpur, Kuala Lumpur, Malaysia
    Design and maintain scalable failover systems, backup strategies, and redundancy mechanisms across cloud and on-prem environments. Develop and update DR documentation, runbooks, and recovery playboo...Show moreLast updated: 13 hours ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Tata Consultancy ServicesKuala Lumpur, Kuala Lumpur, Malaysia
    Talent Acquisition | Human Resource Executive | Tata Consultancy Service.Join Tata Consultancy Services, Asia Pacific and be part of an organization committed to sustainable development for our fut...Show moreLast updated: 26 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Unison Consulting Pte LtdKuala Lumpur, Kuala Lumpur, Malaysia
    As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services. Your expertise will help bridge the gap between development and op...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Smart Teq Solution Sdn BhdKuala Lumpur, Kuala Lumpur, Malaysia
    Ensure all our infrastructure are running at optimal condition.Provide deployment, patches and update on all services that running on public cloud and on premise. Identify and resolve support ticket...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer (SRE)

    Senior Site Reliability Engineer (SRE)

    Ryt BankKuala Lumpur, Kuala Lumpur, Malaysia
    Senior Talent Acquisition Specialist @ Ryt Bank.We are Ryt Bank, a joint venture between YTL and the SEA Group, proudly awarded as one of the five digital banking license winners by BNM in Malaysia...Show moreLast updated: 30+ days ago
    Site Reliability Engineer (SRE) / Devops Engineer

    Site Reliability Engineer (SRE) / Devops Engineer

    Unison GroupKuala Lumpur, Federal Territory of Kuala Lumpur, MY
    Quick Apply
    As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services. Your expertise will help bridge the gap between development and op...Show moreLast updated: 30+ days ago
    • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    SwiftKuala Lumpur, Kuala Lumpur, Malaysia
    We’re the world’s leading provider of secure financial messaging services, headquartered in Belgium.We are the way the world moves value – across borders, through cities and overseas.No other organ...Show moreLast updated: 6 days ago
    • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    Swift SoftwareKuala Lumpur, Kuala Lumpur, Malaysia
    Lead Site Reliability Engineer page is loaded## Lead Site Reliability Engineerlocations : Kuala Lumpur, Malaysiatime type : Full timeposted on : Posted Todayjob requisition id : We’re the worl...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    HCLTechSepang, Selangor, Malaysia
    Administer and support VMware environments including VCF, VCD, NSX, ESXi, vCenter, vSAN, vRA / vRO, and Tanzu.Design, implement, and maintain automation scripts and tools to improve system reliabilit...Show moreLast updated: 13 hours ago