Overview
Site Reliability Engineer (L2 Support) role at CareCone Group in Kuala Lumpur, Malaysia. Responsible for end-to-end application support, production incident handling, platform monitoring, and coordination with L1, L3, and Infrastructure teams to ensure performance, availability, and operational continuity across UAT, Production, and DR environments.
Key Responsibilities
- Provide L2 support for the application stack in production and non-production environments.
- Monitor application health using Dynatrace, EFK (Elastic-FluentBit-Kibana), and other monitoring tools.
- Log triage, issue reproduction, root cause analysis, and escalation to L3 when required.
- Execute SOPs, Runbooks, and Incident Playbooks for common issues and ensure SLA compliance.
- Perform deployments and environment validations using Ansible, Terraform.
- Manage and audit user access via ForgeRock IAM.
- Handle incident tickets via ITSM tools.
- Analyze and troubleshoot issues related to application middleware, database (MongoDB, Oracle), and messaging systems (Kafka).
- Participate in incident war rooms, status calls, RCA reviews, and provide post-incident reports.
- Validate monitoring alerts, fine-tune thresholds, and reduce non-actionable noise.
- Document known issues and workarounds; update knowledge base regularly.
Must-Have Skills
Strong hands-on experience with application support in production environmentsKnowledge of ForgeRock Identity Platform, MongoDB, Kafka, Zookeeper, and Oracle.Working experience in Linux (RHEL 8.x) environments and secure shell scripting.Harbor setup and installation of docker / pod manFamiliar with DevOps tools : Ansible, Terraform, helmchart, Harbour, AzureDevOPs, Pipelines, Jenkins, Git.Exposure to Dynatrace, EFK stack (Elastic Server), Rancher, Harbor.Sound understanding of ITIL processes, including incident, problem, and change management.Familiarity with TLS encryption, secret management (Vault), and basic security posture.Good communication skills (English mandatory)Proactive, solution-oriented attitude with high ownership.Ability to work under pressure and handle critical incidents independently.Seniority level
AssociateEmployment type
ContractJob function
Information TechnologyIndustries
Technology, Information and Media#J-18808-Ljbffr