Talent.com
Site Reliability Engineer

Site Reliability Engineer

FINEXUS GroupKuala Lumpur, Kuala Lumpur, Malaysia
1 day ago
Job description

Get AI-powered advice on this job and more exclusive features.

Direct message the job poster from FINEXUS Group

Senior Specialist, Talent Acquisition | IT Recruitment Expert @ Finexus Hiring Top Talent! Ex-BNM | Driving Excellence in Recruitment

System Reliability & Operations

  • Ensure high availability and reliability of IT systems, applications, and PCI DSS‑certified data centres, supporting both internal operations and client‑facing platforms.
  • Perform system administration (Linux and Windows servers), including installation, configuration, patching, monitoring, and performance tuning.
  • Manage data storage, backup, and disaster recovery (DRP) to ensure data integrity, resilience, and compliance with industry standards.
  • Conduct capacity planning and lifecycle management of infrastructure resources, ensuring optimal performance and scalability.
  • Define and monitor Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets to measure and improve reliability.
  • Implement chaos testing and fault‑injection practices to proactively identify weaknesses and improve system resilience.
  • Optimize observability and alerting systems (e.g., Prometheus, Grafana, ELK, Nagios or equivalent) to ensure actionable insights and minimal alert fatigue.

Security & Compliance

  • Implement and maintain system and network security controls, including firewall management, VPN, identity / access management, and endpoint security.
  • Ensure compliance with BNM RMiT, PCI DSS, and ISO 27001 standards, supporting internal and external audits.
  • Manage system logs and integrate with SIEM platforms to strengthen monitoring and incident response capabilities.
  • Support vulnerability management programs by coordinating with Security Operations teams for timely patching and remediation.
  • Participate in risk assessment and security architecture reviews, ensuring SRE practices align with compliance requirements.
  • Cloud, Containerization & Automation

  • Support and optimize hybrid cloud environments (AWS, Azure, GCP) to align with Finexus’ cloud strategy and cost efficiency.
  • Deploy, configure, and maintain Kubernetes clusters (SUSE Rancher Prime) and containerized workloads to improve scalability and reliability.
  • Build and maintain CI / CD pipelines for automated deployment, testing, and operational efficiency.
  • Automate configuration and patch management using tools such as Ansible, Puppet, or equivalent.
  • Implement Infrastructure as Code (IaC) using Terraform or equivalent for consistent and auditable environment provisioning.
  • Develop auto‑healing and self‑recovery automation scripts to reduce manual interventions and mean time to recovery (MTTR).
  • Implement cost optimization and performance monitoring for cloud and container workloads.
  • Networking & Core Services

  • Administer and troubleshoot DNS, DHCP, VPN, load balancers, and core network services to ensure smooth operations.
  • Support virtualization platforms (Proxmox / etc) and physical server infrastructure within Finexus data centres.
  • Integrate network observability tools for real‑time visibility into latency, bandwidth, and routing anomalies.
  • Collaborate on zero‑trust network segmentation and service mesh integration for improved security and reliability.
  • Monitoring & Support

  • Provide on‑call support on a rotational basis for production issues and incidents, ensuring rapid resolution and minimal downtime.
  • Collaborate with application, database, and security teams to deliver reliable, compliant, and high-performance services for clients.
  • Lead post‑incident reviews (PIRs) and blameless retrospectives to identify root causes and preventive actions.
  • Maintain runbooks and operational documentation to streamline response and improve knowledge transfer.
  • Leverage AIOps or event‑correlation tools to enhance proactive incident detection and reduce false positives.
  • Job Requirements

  • Bachelor’s or Master’s Degree in Computer Science, Information Technology, Engineering, or related field.
  • 4+ years of experience in Site Reliability Engineering, System Administration, or IT Infrastructure.
  • Proven experience in Linux and Windows system administration.
  • Hands‑on experience with cloud operations (AWS, Azure, GCP) and container orchestration (Kubernetes, Rancher).
  • Strong knowledge of networking, firewalls, DNS, DHCP, VPN, and enterprise security best practices.
  • Experience in database management (MySQL, PostgreSQL, or equivalent), including backup, tuning, and recovery.
  • Knowledge of compliance frameworks (PCI DSS, ISO 27001, BNM RMiT) is highly desirable.
  • Strong problem‑solving and troubleshooting skills in mission‑critical environments.
  • Excellent communication skills in English and Malay (spoken and written).
  • Ability to work independently and collaboratively in a fast‑paced, regulated technology environment.
  • Experience with SRE toolchains : Prometheus, Grafana, ELK, Terraform, Ansible, Jenkins, GitLab CI / CD, or equivalent.
  • Possession of relevant certifications, including AWS Certified SysOps Administrator, RHCE, Kubernetes Administrator (CKA), or ISO 27001 Implementer, will be considered an added advantage.
  • Seniority level

  • Associate
  • Employment type

  • Full‑time
  • Job function

  • Engineering, Administrative, and Information Technology
  • Industries

  • Technology, Information and Media
  • #J-18808-Ljbffr

    Create a job alert for this search

    Site Reliability Engineer • Kuala Lumpur, Kuala Lumpur, Malaysia

    Related jobs
    • Promoted
    Site Reliability Engineer III

    Site Reliability Engineer III

    Guidewire SoftwareKuala Lumpur, Kuala Lumpur, Malaysia
    At Guidewire, we make software that offers Property and Casualty (P&C) Insurance companies the tools to take care of their customers when they need it the most, whether that’s a time of crisis, a n...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    Refine GroupKuala Lumpur, Kuala Lumpur, Malaysia
    Design and maintain scalable failover systems, backup strategies, and redundancy mechanisms across cloud and on-prem environments. Create and update disaster recovery documentation, runbooks, and re...Show moreLast updated: 1 day ago
    • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    SWIFTKuala Lumpur, Kuala Lumpur, Malaysia
    We’re the world’s leading provider of secure financial messaging services, headquartered in Belgium.We are the way the world moves value – across borders, through cities and overseas.No other organ...Show moreLast updated: 21 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    HCL Singapore Pte LtdCyberjaya, Selangor, Malaysia
    Administer and support VMware environments including VCF, VCD, NSX, ESXi, vCenter, vSAN, vRA / vRO, and Tanzu.Design, implement, and maintain automation scripts and tools to improve system reliabilit...Show moreLast updated: 1 day ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CanonicalKajang Municipal Council, Selangor, Malaysia
    Site Reliability Engineer role at Canonical.We deploy and run OpenStack, Kubernetes, storage solutions, and open source applications, applying DevOps practices. To succeed in this role, you need to ...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    AmpstekKuala Lumpur, Kuala Lumpur, Malaysia
    Ampstek Federal Territory of Kuala Lumpur, Malaysia.We are looking for a skilled Site Reliability Engineer (SRE) to join our technology operations team. The ideal candidate will be responsible for b...Show moreLast updated: 14 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    GX Bank BerhadPetaling Jaya, Selangor, Malaysia
    Site Reliability Engineer page is loaded.Apply locations Petaling Jaya (First Avenue) time type Full time posted on Posted 9 Days Ago job requisition id R-. GX Bank Berhad - the Grab-led Digital Ban...Show moreLast updated: 30+ days ago
    Site Reliability Engineer

    Site Reliability Engineer

    Unison GroupKuala Lumpur, Federal Territory of Kuala Lumpur, MY
    Quick Apply
    As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services. Your expertise will help bridge the gap between development and op...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Russell TobinKuala Lumpur, Kuala Lumpur, Malaysia
    Job Opportunity : Site Reliability Engineer (SRE) in Cyberjaya.Note : Only Malaysian locals or PR holders can apply.We are looking for a Site Reliability Engineer (SRE) to join our forward-thinking C...Show moreLast updated: 19 days ago
    • Promoted
    Site Reliability Engineer (SRE) / Devops Engineer

    Site Reliability Engineer (SRE) / Devops Engineer

    Unison Consulting Pte LtdKuala Lumpur, Kuala Lumpur, Malaysia
    As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services. Your expertise will help bridge the gap between development and op...Show moreLast updated: 30+ days ago
    • Promoted
    Specialist, Site Reliability Engineer (SRE)

    Specialist, Site Reliability Engineer (SRE)

    TNG DigitalKuala Lumpur, Kuala Lumpur, Malaysia
    Specialist, Site Reliability Engineer (SRE).We are hiring for a Specialist, Site Reliability Engineer (SRE) to join our team. Role focuses on network administration, cloud infrastructure management,...Show moreLast updated: 15 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Tata Consultancy ServicesKuala Lumpur, Kuala Lumpur, Malaysia
    Talent Acquisition | Human Resource Executive | Tata Consultancy Service.Join Tata Consultancy Services, Asia Pacific and be part of an organization committed to sustainable development for our fut...Show moreLast updated: 27 days ago
    • Promoted
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    FPT SoftwareKuala Lumpur, Kuala Lumpur, Malaysia
    Design and maintain scalable failover systems, backup strategies, and redundancy mechanisms across cloud and on-prem environments. Develop and update DR documentation, runbooks, and recovery playboo...Show moreLast updated: 1 day ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Unison Consulting Pte LtdKuala Lumpur, Kuala Lumpur, Malaysia
    As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services. Your expertise will help bridge the gap between development and op...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Smart Teq Solution Sdn BhdKuala Lumpur, Kuala Lumpur, Malaysia
    Ensure all our infrastructure are running at optimal condition.Provide deployment, patches and update on all services that running on public cloud and on premise. Identify and resolve support ticket...Show moreLast updated: 30+ days ago
    • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    Swift SoftwareKuala Lumpur, Kuala Lumpur, Malaysia
    Lead Site Reliability Engineer page is loaded## Lead Site Reliability Engineerlocations : Kuala Lumpur, Malaysiatime type : Full timeposted on : Posted Todayjob requisition id : We’re the worl...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    HCLTechSepang, Selangor, Malaysia
    Administer and support VMware environments including VCF, VCD, NSX, ESXi, vCenter, vSAN, vRA / vRO, and Tanzu.Design, implement, and maintain automation scripts and tools to improve system reliabilit...Show moreLast updated: 1 day ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Razer Inc.Kuala Lumpur, Kuala Lumpur, Malaysia
    Bangsar South, Federal Territory of Kuala Lumpur, Malaysia.Joining Razer will place you on a global mission to revolutionize the way the world games. Razer is a place to do great work, offering you ...Show moreLast updated: 1 day ago