Talent.com
Tawaran kerja ini tidak tersedia di negara anda.
Site Reliability Engineer (L2 Support)

Site Reliability Engineer (L2 Support)

CareCone GroupKuala Lumpur, Kuala Lumpur, Malaysia
2 hari lalu
Penerangan pekerjaan

Overview

Site Reliability Engineer (L2 Support) role at CareCone Group in Kuala Lumpur, Malaysia. Responsible for end-to-end application support, production incident handling, platform monitoring, and coordination with L1, L3, and Infrastructure teams to ensure performance, availability, and operational continuity across UAT, Production, and DR environments.

Key Responsibilities

  • Provide L2 support for the application stack in production and non-production environments.
  • Monitor application health using Dynatrace, EFK (Elastic-FluentBit-Kibana), and other monitoring tools.
  • Log triage, issue reproduction, root cause analysis, and escalation to L3 when required.
  • Execute SOPs, Runbooks, and Incident Playbooks for common issues and ensure SLA compliance.
  • Perform deployments and environment validations using Ansible, Terraform.
  • Manage and audit user access via ForgeRock IAM.
  • Handle incident tickets via ITSM tools.
  • Analyze and troubleshoot issues related to application middleware, database (MongoDB, Oracle), and messaging systems (Kafka).
  • Participate in incident war rooms, status calls, RCA reviews, and provide post-incident reports.
  • Validate monitoring alerts, fine-tune thresholds, and reduce non-actionable noise.
  • Document known issues and workarounds; update knowledge base regularly.

Must-Have Skills

  • Strong hands-on experience with application support in production environments
  • Knowledge of ForgeRock Identity Platform, MongoDB, Kafka, Zookeeper, and Oracle.
  • Working experience in Linux (RHEL 8.x) environments and secure shell scripting.
  • Harbor setup and installation of docker / pod man
  • Familiar with DevOps tools : Ansible, Terraform, helmchart, Harbour, AzureDevOPs, Pipelines, Jenkins, Git.
  • Exposure to Dynatrace, EFK stack (Elastic Server), Rancher, Harbor.
  • Sound understanding of ITIL processes, including incident, problem, and change management.
  • Familiarity with TLS encryption, secret management (Vault), and basic security posture.
  • Good communication skills (English mandatory)
  • Proactive, solution-oriented attitude with high ownership.
  • Ability to work under pressure and handle critical incidents independently.
  • Seniority level

  • Associate
  • Employment type

  • Contract
  • Job function

  • Information Technology
  • Industries

  • Technology, Information and Media
  • #J-18808-Ljbffr

    Buat amaran kerja untuk carian ini

    Site Reliability Engineer • Kuala Lumpur, Kuala Lumpur, Malaysia