Site Reliability Engineer – Johor Bahru
Join to apply for the Site Reliability Engineer – Johor Bahru role at Arvion Services .
As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services. Your expertise will help bridge the gap between development and operations, ensuring robust, scalable, and responsive infrastructure. You will collaborate closely with diverse teams to drive reliability improvements and foster a culture of continuous learning and accountability.
What You Will Do
- Strong experience with Linux systems and distributed computing fundamentals.
- Proven experience in troubleshooting application issues with a focus on performance and connectivity.
- Familiarity with networking concepts and effective troubleshooting techniques.
- Experience in Bash / Shell scripting or automation for system administration tasks.
- Experience in programming languages such as Python, Golang, Java, or similar, focusing on operational efficiency.
- Demonstrated experience in system architecture and design, prioritizing reliability and scalability.
- Understanding of SRE principles, including SLOs, SLIs, toil reduction, and incident post‑mortems.
- Hands‑on experience with cloud environments (e.g., AWS, Azure, Google Cloud) and their operational management.
- Excellent problem-solving abilities and a proactive approach to operational challenges.
- Ability to work independently while effectively collaborating within a team environment.
- Open to working in rotational shifts.
- Able to communicate in Mandarin.
What Makes You a Good Fit
Monitor and maintain system performance to ensure the stability and reliability of applications and infrastructure.Design and implement resilient system architectures that support high availability and scalability.Develop automation tools and scripts to enhance operational efficiency and reduce manual effort.Define, track, and analyze SLOs and SLIs to ensure reliability and performance meet business needs.Conduct thorough post‑mortem analyses following incidents and drive continuous improvement.Collaborate with development and operations teams to establish best practices in system reliability and incident management.Troubleshoot and resolve issues related to database performance, network connectivity, and deployment failures.Ensure that issues are resolved within the stipulated SLAs, maintaining high standards of service delivery.Identify and troubleshoot performance bottlenecks in applications and infrastructure, providing actionable recommendations.Maintain detailed documentation of processes and incident responses.Improve monitoring solutions to proactively identify and mitigate issues before they impact services.Assist in the deployment and configuration of new applications and services.Participate in on‑call rotations and respond to critical incidents as they arise.Analyze system logs and metrics to identify trends and potential areas for improvement.What We Offer
Competitive Salary and commission.Collaborative working environment with multilingual teams.Full training provided.Other benefits are shared during the interview.Seniority Level
Associate
Employment Type
Full‑time
Job Function
Engineering and Information Technology
Industries
Outsourcing and Offshoring Consulting
Referrals increase your chances of interviewing at Arvion Services by 2x.
Get notified about new Site Reliability Engineer jobs in Johor Bahru, Johor, Malaysia.
#J-18808-Ljbffr