We are seeking an experienced Cloud Operations Engineer. You'll play a key role in keeping our client-facing applications, APIs and cloud infrastructure running smoothly ensuring uptime, performance and reliability across multiple environments.
This role suits someone who loves problem-solving, enjoys working with Linux and modern cloud tools, and wants to grow in DevOps / Site Reliability Engineering.
Key Responsibilities :
- Monitor & Maintain the performance and availability of our cloud-hosted applications and infrastructure.
- Deploy & Configure new services (on Kubernetes, virtual machines or cloud instances) following best practices.
- Troubleshoot issues across servers, databases, networks and deployment pipelines identify root causes and resolve them quickly.
- Automate routine checks and maintenance tasks using Bash or scripting tools.
- Collaborate with developers, DevOps engineers and data teams to ensure smooth releases and stable environments.
- Continuously improve our monitoring systems, alerting processes and incident response playbooks.
- Participate in on-call rotation and respond to incidents according to defined SLAs.
Qualifications :
Solid Linux system administration skills (Ubuntu, CentOS or similar).Experience troubleshooting application and infrastructure issues network connectivity, database performance, deployments.Familiar with Kubernetes, Docker or other container platforms .Understanding of databases (MySQL, PostgreSQL, etc.) and performance tuning.Ability to script / automate in Bash, Shell, or Python.Comfortable working in a rotational shift environmentGood communication skills; Mandarin ability is a plus