EPAM Systems Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia
System Engineer, EPAM Systems
Join EPAM Malaysia as an Application Support Engineer , where you will play a key role in sustaining the performance and reliability of mission‑critical platforms.
You’ll manage containerised workloads, optimise database operations, and troubleshoot complex issues across production environments. Partner closely with engineering, operations, and product teams to ensure seamless service delivery and drive continuous improvement across our application ecosystem. This role is ideal for mid‑level engineers seeking to deepen their expertise within a modern, cloud‑native stack while contributing to critical production systems.
Responsibilities
- Manage and resolve issues detected by monitoring systems or reported by internal / external users, ensuring timely and high‑quality responses
- Monitor application performance, identify degradation patterns and proactively implement corrective actions
- Automate operational workflows and routine tasks through scripting and tooling
- Lead or contribute to root cause analysis (RCA) and post‑mortems and ensure corrective measures are implemented
- Maintain and enhance project documentation, runbooks and operational guidelines to support platform reliability
- Work closely with development, product, and operations teams to implement solutions and improve platform stability
- Contribute to process improvements that enhance platform scalability, reliability and operational efficiency
Requirements
3–5 years of hands‑on experience in software application support, platform operations, or production engineering, with demonstrated ability to troubleshoot and optimise production systemsProven experience operating PostgreSQL in production environments, including installation, configuration, performance tuning and troubleshooting complex SQL workloadsOperational experience with Docker and Kubernetes, including workload management, deployment troubleshooting and containerised application optimisationSolid understanding of Linux administration, including logs, system services, permissions and performance diagnosticsPractical experience with monitoring and log analysis using tools such as Prometheus, Grafana, ELK, Loki, or similarAbility to lead troubleshooting, contribute to RCA and implement durable solutions in high‑availability environmentsStrong communication and collaboration skills with the ability to articulate technical details clearly in English (spoken and written)Nice to have
Experience with scripting (Shell, Python, or similar) for automation and operational efficiencyExposure to cloud‑native ecosystems and basic DevOps practicesFamiliarity with CI / CD pipelines, automation tools and infrastructure‑as‑code frameworksWe offer
International projects with top brandsWork with global teams of highly skilled, diverse peersEmployee financial programmesPaid time off and sick leaveUpskilling, reskilling and certification coursesUnlimited access to the LinkedIn Learning library and 22,000+ coursesGlobal career opportunitiesVolunteer and community involvement opportunitiesEPAM Employee GroupsAward‑winning culture recognised by Glassdoor, Newsweek and LinkedIn#J-18808-Ljbffr