Responsibilities :
- Ensure high availability and performance of systems
- Analyze performance metrics and resolve incidents (P0P3)
- Involve in system design and set reliability goals
- Continuously optimize and innovate for better user experience
- Improve and maintain the full lifecycle of services : development to deployment
- Observability, monitoring, and troubleshooting of distributed cloud systems
- Proficient in debugging and automating tasks in OS, networking, databases, and applications
Requirements :
Programming in Java , Python , or Go , Scripting with Shell , Terraform , Ansible , Chef , or PuppetStrong understanding of Linux / Unix , containers, VMs, and cloud platformsExperience with DevOps processes, Automation using SaltStack , Spinnaker , or StackStormExperience with big data , chaos engineering , auto-scaling container platformsBackground in data science , cybersecurity (SIEM, threat modeling)Performance tuning for cloud networks , middleware , RDBMS , NoSQL , etc.Bachelor's or higher in Computer Science or Electronics & CommunicationStrong analytical and communication skills. Quick adaptability and problem-solving abilitiesPassion for continuous learning and staying updated with tech trendsNotes : Malaysia Roles : 1-10 years of relevant experience
India Roles : 5+ years of relevant experience , WFH- EU shift