Site Reliability Engineer

FINEXUS GroupKuala Lumpur, Kuala Lumpur, Malaysia

1 day ago

Job description

Get AI-powered advice on this job and more exclusive features.

Direct message the job poster from FINEXUS Group

Senior Specialist, Talent Acquisition | IT Recruitment Expert @ Finexus Hiring Top Talent! Ex-BNM | Driving Excellence in Recruitment

System Reliability & Operations

Ensure high availability and reliability of IT systems, applications, and PCI DSS‑certified data centres, supporting both internal operations and client‑facing platforms.
Perform system administration (Linux and Windows servers), including installation, configuration, patching, monitoring, and performance tuning.
Manage data storage, backup, and disaster recovery (DRP) to ensure data integrity, resilience, and compliance with industry standards.
Conduct capacity planning and lifecycle management of infrastructure resources, ensuring optimal performance and scalability.
Define and monitor Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets to measure and improve reliability.
Implement chaos testing and fault‑injection practices to proactively identify weaknesses and improve system resilience.
Optimize observability and alerting systems (e.g., Prometheus, Grafana, ELK, Nagios or equivalent) to ensure actionable insights and minimal alert fatigue.

Security & Compliance

Implement and maintain system and network security controls, including firewall management, VPN, identity / access management, and endpoint security.

Ensure compliance with BNM RMiT, PCI DSS, and ISO 27001 standards, supporting internal and external audits.

Manage system logs and integrate with SIEM platforms to strengthen monitoring and incident response capabilities.

Support vulnerability management programs by coordinating with Security Operations teams for timely patching and remediation.

Participate in risk assessment and security architecture reviews, ensuring SRE practices align with compliance requirements.

Cloud, Containerization & Automation

Support and optimize hybrid cloud environments (AWS, Azure, GCP) to align with Finexus’ cloud strategy and cost efficiency.

Deploy, configure, and maintain Kubernetes clusters (SUSE Rancher Prime) and containerized workloads to improve scalability and reliability.

Build and maintain CI / CD pipelines for automated deployment, testing, and operational efficiency.

Automate configuration and patch management using tools such as Ansible, Puppet, or equivalent.

Implement Infrastructure as Code (IaC) using Terraform or equivalent for consistent and auditable environment provisioning.

Develop auto‑healing and self‑recovery automation scripts to reduce manual interventions and mean time to recovery (MTTR).

Implement cost optimization and performance monitoring for cloud and container workloads.

Networking & Core Services

Administer and troubleshoot DNS, DHCP, VPN, load balancers, and core network services to ensure smooth operations.

Support virtualization platforms (Proxmox / etc) and physical server infrastructure within Finexus data centres.

Integrate network observability tools for real‑time visibility into latency, bandwidth, and routing anomalies.

Collaborate on zero‑trust network segmentation and service mesh integration for improved security and reliability.

Monitoring & Support

Provide on‑call support on a rotational basis for production issues and incidents, ensuring rapid resolution and minimal downtime.

Collaborate with application, database, and security teams to deliver reliable, compliant, and high-performance services for clients.

Lead post‑incident reviews (PIRs) and blameless retrospectives to identify root causes and preventive actions.

Maintain runbooks and operational documentation to streamline response and improve knowledge transfer.

Leverage AIOps or event‑correlation tools to enhance proactive incident detection and reduce false positives.

Job Requirements

Bachelor’s or Master’s Degree in Computer Science, Information Technology, Engineering, or related field.

4+ years of experience in Site Reliability Engineering, System Administration, or IT Infrastructure.

Proven experience in Linux and Windows system administration.

Hands‑on experience with cloud operations (AWS, Azure, GCP) and container orchestration (Kubernetes, Rancher).

Strong knowledge of networking, firewalls, DNS, DHCP, VPN, and enterprise security best practices.

Experience in database management (MySQL, PostgreSQL, or equivalent), including backup, tuning, and recovery.

Knowledge of compliance frameworks (PCI DSS, ISO 27001, BNM RMiT) is highly desirable.

Strong problem‑solving and troubleshooting skills in mission‑critical environments.

Excellent communication skills in English and Malay (spoken and written).

Ability to work independently and collaboratively in a fast‑paced, regulated technology environment.

Experience with SRE toolchains : Prometheus, Grafana, ELK, Terraform, Ansible, Jenkins, GitLab CI / CD, or equivalent.

Possession of relevant certifications, including AWS Certified SysOps Administrator, RHCE, Kubernetes Administrator (CKA), or ISO 27001 Implementer, will be considered an added advantage.

Seniority level

Associate

Employment type

Full‑time

Job function

Engineering, Administrative, and Information Technology

Industries

Technology, Information and Media

#J-18808-Ljbffr

Create a job alert for this search

Site Reliability Engineer • Kuala Lumpur, Kuala Lumpur, Malaysia

Related jobs

Promoted

Site Reliability Engineer III

Guidewire SoftwareKuala Lumpur, Kuala Lumpur, Malaysia

At Guidewire, we make software that offers Property and Casualty (P&C) Insurance companies the tools to take care of their customers when they need it the most, whether that’s a time of crisis, a n...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer (SRE)

Refine GroupKuala Lumpur, Kuala Lumpur, Malaysia

Design and maintain scalable failover systems, backup strategies, and redundancy mechanisms across cloud and on-prem environments. Create and update disaster recovery documentation, runbooks, and re...Show moreLast updated: 1 day ago

Promoted

Lead Site Reliability Engineer

SWIFTKuala Lumpur, Kuala Lumpur, Malaysia

We’re the world’s leading provider of secure financial messaging services, headquartered in Belgium.We are the way the world moves value – across borders, through cities and overseas.No other organ...Show moreLast updated: 21 days ago

Promoted

Site Reliability Engineer

HCL Singapore Pte LtdCyberjaya, Selangor, Malaysia

Administer and support VMware environments including VCF, VCD, NSX, ESXi, vCenter, vSAN, vRA / vRO, and Tanzu.Design, implement, and maintain automation scripts and tools to improve system reliabilit...Show moreLast updated: 1 day ago

Promoted

Site Reliability Engineer

CanonicalKajang Municipal Council, Selangor, Malaysia

Site Reliability Engineer role at Canonical.We deploy and run OpenStack, Kubernetes, storage solutions, and open source applications, applying DevOps practices. To succeed in this role, you need to ...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

AmpstekKuala Lumpur, Kuala Lumpur, Malaysia

Ampstek Federal Territory of Kuala Lumpur, Malaysia.We are looking for a skilled Site Reliability Engineer (SRE) to join our technology operations team. The ideal candidate will be responsible for b...Show moreLast updated: 14 days ago

Promoted

Site Reliability Engineer

GX Bank BerhadPetaling Jaya, Selangor, Malaysia

Site Reliability Engineer page is loaded.Apply locations Petaling Jaya (First Avenue) time type Full time posted on Posted 9 Days Ago job requisition id R-. GX Bank Berhad - the Grab-led Digital Ban...Show moreLast updated: 30+ days ago

Site Reliability Engineer

Unison GroupKuala Lumpur, Federal Territory of Kuala Lumpur, MY

Quick Apply

As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services. Your expertise will help bridge the gap between development and op...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

Russell TobinKuala Lumpur, Kuala Lumpur, Malaysia

Job Opportunity : Site Reliability Engineer (SRE) in Cyberjaya.Note : Only Malaysian locals or PR holders can apply.We are looking for a Site Reliability Engineer (SRE) to join our forward-thinking C...Show moreLast updated: 19 days ago

Promoted

Site Reliability Engineer (SRE) / Devops Engineer

Unison Consulting Pte LtdKuala Lumpur, Kuala Lumpur, Malaysia

Promoted

Specialist, Site Reliability Engineer (SRE)

TNG DigitalKuala Lumpur, Kuala Lumpur, Malaysia

Specialist, Site Reliability Engineer (SRE).We are hiring for a Specialist, Site Reliability Engineer (SRE) to join our team. Role focuses on network administration, cloud infrastructure management,...Show moreLast updated: 15 days ago

Promoted

Site Reliability Engineer

Tata Consultancy ServicesKuala Lumpur, Kuala Lumpur, Malaysia

Talent Acquisition | Human Resource Executive | Tata Consultancy Service.Join Tata Consultancy Services, Asia Pacific and be part of an organization committed to sustainable development for our fut...Show moreLast updated: 27 days ago

Promoted

Site Reliability Engineer (SRE)

FPT SoftwareKuala Lumpur, Kuala Lumpur, Malaysia

Design and maintain scalable failover systems, backup strategies, and redundancy mechanisms across cloud and on-prem environments. Develop and update DR documentation, runbooks, and recovery playboo...Show moreLast updated: 1 day ago

Promoted

Site Reliability Engineer

Unison Consulting Pte LtdKuala Lumpur, Kuala Lumpur, Malaysia

Promoted

Site Reliability Engineer

Smart Teq Solution Sdn BhdKuala Lumpur, Kuala Lumpur, Malaysia

Ensure all our infrastructure are running at optimal condition.Provide deployment, patches and update on all services that running on public cloud and on premise. Identify and resolve support ticket...Show moreLast updated: 30+ days ago

Promoted

Lead Site Reliability Engineer

Swift SoftwareKuala Lumpur, Kuala Lumpur, Malaysia

Lead Site Reliability Engineer page is loaded## Lead Site Reliability Engineerlocations : Kuala Lumpur, Malaysiatime type : Full timeposted on : Posted Todayjob requisition id : We’re the worl...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

HCLTechSepang, Selangor, Malaysia

Promoted

Site Reliability Engineer

Razer Inc.Kuala Lumpur, Kuala Lumpur, Malaysia

Bangsar South, Federal Territory of Kuala Lumpur, Malaysia.Joining Razer will place you on a global mission to revolutionize the way the world games. Razer is a place to do great work, offering you ...Show moreLast updated: 1 day ago