Site Reliability Engineer (SRE)

FPT SoftwareKuala Lumpur, Kuala Lumpur, Malaysia

13 hours ago

Job description

Design and maintain scalable failover systems, backup strategies, and redundancy mechanisms across cloud and on-prem environments.

Develop and update DR documentation, runbooks, and recovery playbooks for infrastructure and application layers.

2. Business Continuity Testing :

Plan, coordinate, and execute tabletop exercises, DR drills, and failover simulations.

Analyze and report outcomes of BC / DR tests; identify gaps and lead remediation initiatives.

3. Incident Response & Crisis Management :

Develop and refine incident response procedures, escalation paths, and communication frameworks for major outages.

Act as a key responder and facilitator during critical incidents, ensuring swift coordination across teams.

4. Data Backup & Recovery Strategy :

Implement and manage cloud-based and on-premise backup solutions, aligned with defined RTO (Recovery Time Objective) and RPO (Recovery Point Objective).

Regularly test and validate data restoration processes to ensure system recoverability.

5. 24 / 7 / 365 Coverage :

Participate in a rotating on-call schedule to ensure continuous coverage.

Daily operations will include 3 shifts, each lasting 9 hours, with 1 member per shift and an overlapping hour between shifts to facilitate smooth transitions.

6. Collaboration with Tier 1 and Tier 2 Support :

Work closely with Tier 1 and Tier 2 teams who will serve as the first point of contact for incidents and service requests.
Provide expertise and escalation support as needed, ensuring efficient resolution of issues and seamless communication between teams.

Qualifications :

Bachelor’s degree in Computer Science, Engineering, or a related field.

3+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure roles.

Proven experience with DR planning, testing, and recovery operations.

Proficiency in AWS, with a focus on relevant services that support infrastructure and application layers.

Hands‑on experience with backup solutions (e.g., Veeam, Rubrik, AWS Backup, Azure Site Recovery).

Strong understanding of high availability, system redundancy, and incident management frameworks (e.g., ITIL, NIST).

Familiarity with monitoring and alerting tools (e.g., Prometheus, Grafana, Splunk, PagerDuty).

Strong spoken and written English communication skills, essential for effective collaboration with global teams.

Preferred Skills :

Certifications in cloud platforms (e.g., AWS Solutions Architect, Azure Administrator)

Experience with chaos engineering or reliability testing tools (e.g., Gremlin, Chaos Monkey).

Be careful - Don’t provide your bank or credit card details when applying for jobs. Don't transfer any money or complete suspicious online surveys. If you see something suspicious, report this job ad.

#J-18808-Ljbffr

Create a job alert for this search

Site Reliability Engineer • Kuala Lumpur, Kuala Lumpur, Malaysia

Related jobs

Promoted
New!

Site Reliability Engineer (SRE)

Refine GroupKuala Lumpur, Kuala Lumpur, Malaysia

Design and maintain scalable failover systems, backup strategies, and redundancy mechanisms across cloud and on-prem environments. Create and update disaster recovery documentation, runbooks, and re...Show moreLast updated: 13 hours ago

Promoted

Site Reliability Engineer III

Guidewire SoftwareKuala Lumpur, Kuala Lumpur, Malaysia

At Guidewire, we make software that offers Property and Casualty (P&C) Insurance companies the tools to take care of their customers when they need it the most, whether that’s a time of crisis, a n...Show moreLast updated: 30+ days ago

Promoted
New!

Site Reliability Engineer

FINEXUS GroupKuala Lumpur, Kuala Lumpur, Malaysia

Get AI-powered advice on this job and more exclusive features.Direct message the job poster from FINEXUS Group.Senior Specialist, Talent Acquisition | IT Recruitment Expert @ Finexus Hiring Top Tal...Show moreLast updated: 13 hours ago

Promoted

Lead Site Reliability Engineer

SWIFTKuala Lumpur, Kuala Lumpur, Malaysia

We’re the world’s leading provider of secure financial messaging services, headquartered in Belgium.We are the way the world moves value – across borders, through cities and overseas.No other organ...Show moreLast updated: 20 days ago

Promoted
New!

Site Reliability Engineer

HCL Singapore Pte LtdCyberjaya, Selangor, Malaysia

Administer and support VMware environments including VCF, VCD, NSX, ESXi, vCenter, vSAN, vRA / vRO, and Tanzu.Design, implement, and maintain automation scripts and tools to improve system reliabilit...Show moreLast updated: 13 hours ago

Promoted

Site Reliability Engineer

AmpstekKuala Lumpur, Kuala Lumpur, Malaysia

Ampstek Federal Territory of Kuala Lumpur, Malaysia.We are looking for a skilled Site Reliability Engineer (SRE) to join our technology operations team. The ideal candidate will be responsible for b...Show moreLast updated: 13 days ago

Promoted

Site Reliability Engineer

GX Bank BerhadPetaling Jaya, Selangor, Malaysia

Site Reliability Engineer page is loaded.Apply locations Petaling Jaya (First Avenue) time type Full time posted on Posted 9 Days Ago job requisition id R-. GX Bank Berhad - the Grab-led Digital Ban...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer (SRE) / Devops Engineer

Unison Consulting Pte LtdKuala Lumpur, Kuala Lumpur, Malaysia

As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services. Your expertise will help bridge the gap between development and op...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

Russell TobinKuala Lumpur, Kuala Lumpur, Malaysia

Job Opportunity : Site Reliability Engineer (SRE) in Cyberjaya.Note : Only Malaysian locals or PR holders can apply.We are looking for a Site Reliability Engineer (SRE) to join our forward-thinking C...Show moreLast updated: 18 days ago

Promoted

Site Reliability Engineer (DevOps)

Ant InternationalKuala Lumpur, Kuala Lumpur, Malaysia

Direct message the job poster from Ant International.Recruiter @ Ant International | Talent Acquisition Specialist.With headquarters in Singapore and main operations across Asia, Europe, the Middle...Show moreLast updated: 26 days ago

Promoted

Specialist, Site Reliability Engineer (SRE)

TNG DigitalKuala Lumpur, Kuala Lumpur, Malaysia

Specialist, Site Reliability Engineer (SRE).We are hiring for a Specialist, Site Reliability Engineer (SRE) to join our team. Role focuses on network administration, cloud infrastructure management,...Show moreLast updated: 14 days ago

Promoted

Site Reliability Engineer

Tata Consultancy ServicesKuala Lumpur, Kuala Lumpur, Malaysia

Talent Acquisition | Human Resource Executive | Tata Consultancy Service.Join Tata Consultancy Services, Asia Pacific and be part of an organization committed to sustainable development for our fut...Show moreLast updated: 26 days ago

Promoted

Site Reliability Engineer

Unison Consulting Pte LtdKuala Lumpur, Kuala Lumpur, Malaysia

Promoted

Site Reliability Engineer

Smart Teq Solution Sdn BhdKuala Lumpur, Kuala Lumpur, Malaysia

Ensure all our infrastructure are running at optimal condition.Provide deployment, patches and update on all services that running on public cloud and on premise. Identify and resolve support ticket...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

CanonicalPutrajaya, Putrajaya, Malaysia

Site Reliability Engineer role at Canonical.We deploy and run OpenStack, Kubernetes, storage solutions, and open source applications, applying DevOps practices. To succeed in this role, you need to ...Show moreLast updated: 30+ days ago

Promoted

Lead Site Reliability Engineer

Swift SoftwareKuala Lumpur, Kuala Lumpur, Malaysia

Lead Site Reliability Engineer page is loaded## Lead Site Reliability Engineerlocations : Kuala Lumpur, Malaysiatime type : Full timeposted on : Posted Todayjob requisition id : We’re the worl...Show moreLast updated: 30+ days ago

Promoted
New!

Site Reliability Engineer

HCLTechSepang, Selangor, Malaysia

Promoted
New!

Site Reliability Engineer

Razer Inc.Kuala Lumpur, Kuala Lumpur, Malaysia

Bangsar South, Federal Territory of Kuala Lumpur, Malaysia.Joining Razer will place you on a global mission to revolutionize the way the world games. Razer is a place to do great work, offering you ...Show moreLast updated: 13 hours ago