Overview
Our client’s project is a well-established brand in the IT industry who is now looking for a passionate and driven Site Reliability Engineer. This is an exciting opportunity to expand your skill set, achieve job satisfaction and work-life balance. Responsibilities
Contribute to system design and deployment phases with a focus on scalability, reliability, and operability. Ensure that production readiness is considered at every stage of the software lifecycle. Develop automation scripts, infrastructure as code, and tooling using industry best practices to improve system reliability, reduce manual effort, and enable self-service. Review system architectures, deployment strategies, observability setups, and operational documentation to ensure reliability and operational excellence. Analyze production issues, identify root causes, and implement long-term reliability improvements through automation, monitoring, and architectural enhancements. Work collaboratively with other team members and provide guidance to more junior team members. Organize an efficient handover through high quality documentation and training. Automate the deployment and operation of multi-tenant infrastructure, handling tasks that ensure system resilience and availability. Develop and maintain monitoring tools, dashboards, and self-healing mechanisms. Participate in on-call rotations, conduct blameless postmortems, and drive continuous learning. Work closely with developers, product teams, and engineering stakeholders to troubleshoot issues, improve systems, and integrate reliability improvements. Requirements
Bachelor degree in Computer Science or related Minimum 6 years of experience in Site Reliability Engineering or software development within an international company. Hands-on experience with CI / CD and deployment tools such as Ansible, Jenkins, Maven, Nexus, Git, and Docker. Proficiency in Linux OS Proficiency in scripting and automation (e.g. Python, PowerShell, YAML) with the ability to develop tools and infrastructure as code. Familiarity with Java-based systems with the ability to understand code for root cause analysis. Understanding of distributed systems and microservices architectures, including REST and SOAP APIs. Experience with databases, including NoSQL platforms. Familiarity with performance and reliability testing tools such as JMeter or Postman. Exposure to observability and analytics technologies; experience with Elasticsearch or reporting tools like Power BI is a plus. Practical experience working in Agile-driven teams. Strong interpersonal and communication skills, with a customer-centric mindset and the ability to work effectively across cultures. Demonstrated ability to collaborate with distributed teams across multiple time zones. What’s on offer
You will be remunerated with an excellent base salary and entitled to attractive company benefits. Additionally, you will get the opportunity to enjoy a fun and collaborative work environment, alongside a strong career progression. How to apply
To submit your application, please apply online or email your UPDATED CV in Microsoft Word format to Your interest will be treated with strict confidentiality. Privacy Statement
Privacy Statement : Data collected will be used for recruitment purposes only. Personal data provided will be used strictly in accordance with the relevant data protection law and Avensys privacy policy.
#J-18808-Ljbffr
Reliability Engineer • Kuala Lumpur, Malaysia