Senior Digital Platform Ops Specialist
Location : CelcomDigi Tower, Petaling Jaya, Malaysia
Employment Type : Permanent
We are looking for a Senior Digital Platform Ops Specialist to lead and manage a squad responsible for the end‑to‑end stability, performance, and reliability of CelcomDigi’s digital platforms that support our consumer and enterprise services. This role encompasses cloud / application support, including troubleshooting, monitoring and issue resolution, as well as vendor management and SRE automation practices. The ideal candidate will drive operational excellence across the squad and build reliability into systems to ensure seamless service continuity across our digital ecosystem.
Responsibilities
- Lead and provide application support, including monitoring system health to detect issues proactively, supporting application release activities, ensuring timely resolution of incidents with root‑cause analysis, and maintaining service compliance with defined SLAs.
- Manage and optimize cloud infrastructures to ensure scalability, reliability, and cost efficiency, including capacity planning and resource utilization optimization.
- Drive the implementation of SRE principles such as error budgets and service level objectives (SLOs), automating repetitive activities to reduce toil and enable higher‑value engineering initiatives.
- Manage third‑party vendors to ensure delivery of their responsibilities and compliance with agreed support scope, quality standards, and SLAs, handling vendor escalations and performance issues.
- Drive continuous improvement by identifying operational gaps, proposing enhancements, mentoring junior team members and developing a comprehensive knowledge base of best practices, SOPs, and playbooks.
Requirements
Bachelor's degree in Computer Science, Software Engineering, or related field.5+ years of experience in digital platform operations, large‑scale IT systems support or SRE‑related roles.Strong understanding of mobile / web application architecture, APIs, and middleware.Proficiency with tools such as Dynatrace, Firebase, AWS CloudWatch, Datadog, PRTG, Sentry or similar platforms for monitoring and incident management.Proficiency in automation scripting (Python, Bash, or equivalent).Strong vendor management experience with ability to enforce governance.Excellent communication and collaboration skills, especially during incident management.Business Unit : Consumer Business.
CelcomDigi is an equal opportunity employer, and committed to promote employment practices that are transparent, objective and fair.
#J-18808-Ljbffr