Talent.com
Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

FPT SoftwareKuala Lumpur, Kuala Lumpur, Malaysia
13 hours ago
Job description

Design and maintain scalable failover systems, backup strategies, and redundancy mechanisms across cloud and on-prem environments.

Develop and update DR documentation, runbooks, and recovery playbooks for infrastructure and application layers.

2. Business Continuity Testing :

Plan, coordinate, and execute tabletop exercises, DR drills, and failover simulations.

Analyze and report outcomes of BC / DR tests; identify gaps and lead remediation initiatives.

3. Incident Response & Crisis Management :

Develop and refine incident response procedures, escalation paths, and communication frameworks for major outages.

Act as a key responder and facilitator during critical incidents, ensuring swift coordination across teams.

4. Data Backup & Recovery Strategy :

Implement and manage cloud-based and on-premise backup solutions, aligned with defined RTO (Recovery Time Objective) and RPO (Recovery Point Objective).

Regularly test and validate data restoration processes to ensure system recoverability.

5. 24 / 7 / 365 Coverage :

Participate in a rotating on-call schedule to ensure continuous coverage.

Daily operations will include 3 shifts, each lasting 9 hours, with 1 member per shift and an overlapping hour between shifts to facilitate smooth transitions.

6. Collaboration with Tier 1 and Tier 2 Support :

  • Work closely with Tier 1 and Tier 2 teams who will serve as the first point of contact for incidents and service requests.
  • Provide expertise and escalation support as needed, ensuring efficient resolution of issues and seamless communication between teams.

Qualifications :

  • Bachelor’s degree in Computer Science, Engineering, or a related field.
  • 3+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure roles.
  • Proven experience with DR planning, testing, and recovery operations.
  • Proficiency in AWS, with a focus on relevant services that support infrastructure and application layers.
  • Hands‑on experience with backup solutions (e.g., Veeam, Rubrik, AWS Backup, Azure Site Recovery).
  • Strong understanding of high availability, system redundancy, and incident management frameworks (e.g., ITIL, NIST).
  • Familiarity with monitoring and alerting tools (e.g., Prometheus, Grafana, Splunk, PagerDuty).
  • Strong spoken and written English communication skills, essential for effective collaboration with global teams.
  • Preferred Skills :

  • Certifications in cloud platforms (e.g., AWS Solutions Architect, Azure Administrator)
  • Experience with chaos engineering or reliability testing tools (e.g., Gremlin, Chaos Monkey).
  • Be careful - Don’t provide your bank or credit card details when applying for jobs. Don't transfer any money or complete suspicious online surveys. If you see something suspicious, report this job ad.

    #J-18808-Ljbffr

    Create a job alert for this search

    Site Reliability Engineer • Kuala Lumpur, Kuala Lumpur, Malaysia

    Related jobs
    • Promoted
    • New!
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    Refine GroupKuala Lumpur, Kuala Lumpur, Malaysia
    Design and maintain scalable failover systems, backup strategies, and redundancy mechanisms across cloud and on-prem environments. Create and update disaster recovery documentation, runbooks, and re...Show moreLast updated: 13 hours ago
    • Promoted
    Site Reliability Engineer III

    Site Reliability Engineer III

    Guidewire SoftwareKuala Lumpur, Kuala Lumpur, Malaysia
    At Guidewire, we make software that offers Property and Casualty (P&C) Insurance companies the tools to take care of their customers when they need it the most, whether that’s a time of crisis, a n...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    FINEXUS GroupKuala Lumpur, Kuala Lumpur, Malaysia
    Get AI-powered advice on this job and more exclusive features.Direct message the job poster from FINEXUS Group.Senior Specialist, Talent Acquisition | IT Recruitment Expert @ Finexus Hiring Top Tal...Show moreLast updated: 13 hours ago
    • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    SWIFTKuala Lumpur, Kuala Lumpur, Malaysia
    We’re the world’s leading provider of secure financial messaging services, headquartered in Belgium.We are the way the world moves value – across borders, through cities and overseas.No other organ...Show moreLast updated: 20 days ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    HCL Singapore Pte LtdCyberjaya, Selangor, Malaysia
    Administer and support VMware environments including VCF, VCD, NSX, ESXi, vCenter, vSAN, vRA / vRO, and Tanzu.Design, implement, and maintain automation scripts and tools to improve system reliabilit...Show moreLast updated: 13 hours ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    AmpstekKuala Lumpur, Kuala Lumpur, Malaysia
    Ampstek Federal Territory of Kuala Lumpur, Malaysia.We are looking for a skilled Site Reliability Engineer (SRE) to join our technology operations team. The ideal candidate will be responsible for b...Show moreLast updated: 13 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    GX Bank BerhadPetaling Jaya, Selangor, Malaysia
    Site Reliability Engineer page is loaded.Apply locations Petaling Jaya (First Avenue) time type Full time posted on Posted 9 Days Ago job requisition id R-. GX Bank Berhad - the Grab-led Digital Ban...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer (SRE) / Devops Engineer

    Site Reliability Engineer (SRE) / Devops Engineer

    Unison Consulting Pte LtdKuala Lumpur, Kuala Lumpur, Malaysia
    As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services. Your expertise will help bridge the gap between development and op...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Russell TobinKuala Lumpur, Kuala Lumpur, Malaysia
    Job Opportunity : Site Reliability Engineer (SRE) in Cyberjaya.Note : Only Malaysian locals or PR holders can apply.We are looking for a Site Reliability Engineer (SRE) to join our forward-thinking C...Show moreLast updated: 18 days ago
    • Promoted
    Site Reliability Engineer (DevOps)

    Site Reliability Engineer (DevOps)

    Ant InternationalKuala Lumpur, Kuala Lumpur, Malaysia
    Direct message the job poster from Ant International.Recruiter @ Ant International | Talent Acquisition Specialist.With headquarters in Singapore and main operations across Asia, Europe, the Middle...Show moreLast updated: 26 days ago
    • Promoted
    Specialist, Site Reliability Engineer (SRE)

    Specialist, Site Reliability Engineer (SRE)

    TNG DigitalKuala Lumpur, Kuala Lumpur, Malaysia
    Specialist, Site Reliability Engineer (SRE).We are hiring for a Specialist, Site Reliability Engineer (SRE) to join our team. Role focuses on network administration, cloud infrastructure management,...Show moreLast updated: 14 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Tata Consultancy ServicesKuala Lumpur, Kuala Lumpur, Malaysia
    Talent Acquisition | Human Resource Executive | Tata Consultancy Service.Join Tata Consultancy Services, Asia Pacific and be part of an organization committed to sustainable development for our fut...Show moreLast updated: 26 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Unison Consulting Pte LtdKuala Lumpur, Kuala Lumpur, Malaysia
    As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services. Your expertise will help bridge the gap between development and op...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Smart Teq Solution Sdn BhdKuala Lumpur, Kuala Lumpur, Malaysia
    Ensure all our infrastructure are running at optimal condition.Provide deployment, patches and update on all services that running on public cloud and on premise. Identify and resolve support ticket...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CanonicalPutrajaya, Putrajaya, Malaysia
    Site Reliability Engineer role at Canonical.We deploy and run OpenStack, Kubernetes, storage solutions, and open source applications, applying DevOps practices. To succeed in this role, you need to ...Show moreLast updated: 30+ days ago
    • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    Swift SoftwareKuala Lumpur, Kuala Lumpur, Malaysia
    Lead Site Reliability Engineer page is loaded## Lead Site Reliability Engineerlocations : Kuala Lumpur, Malaysiatime type : Full timeposted on : Posted Todayjob requisition id : We’re the worl...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    HCLTechSepang, Selangor, Malaysia
    Administer and support VMware environments including VCF, VCD, NSX, ESXi, vCenter, vSAN, vRA / vRO, and Tanzu.Design, implement, and maintain automation scripts and tools to improve system reliabilit...Show moreLast updated: 13 hours ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    Razer Inc.Kuala Lumpur, Kuala Lumpur, Malaysia
    Bangsar South, Federal Territory of Kuala Lumpur, Malaysia.Joining Razer will place you on a global mission to revolutionize the way the world games. Razer is a place to do great work, offering you ...Show moreLast updated: 13 hours ago