CloudMile Federal Territory of Kuala Lumpur, Malaysia
CloudMile Federal Territory of Kuala Lumpur, Malaysia
Overview
CloudMile, a leading AI and Cloud technology company in Asia that focuses on digital transformation and growth for its corporate clients. We are the winner of the 2023 Google Cloud Sales Partner of the Year for the Greater China region, recognized for its innovative thinking, outstanding customer service, and best-in-class use of Google Cloud products and services. As a member of the “CloudMiler” team, you will be at the forefront of assisting companies in Asia in their digital transformation by leveraging cloud technology, data, and AI. We value collaboration and shared goals over the notion of a lone “superstar.”
As a Tier 2 Cloud, Data & AI Operations Engineer , you will be the second line of defense for our customers, responsible for resolving complex technical issues escalated from our Tier 1 team. You will proactively manage key customer environments, acting as an extension of their internal operations team. This role requires a deep understanding of cloud infrastructure, with an added focus on data pipelines and AI workloads. If you are passionate about solving complex problems, proposing innovative solutions, and building strong relationships with clients, we are interested in talking to you!
“CloudMiler” is a group of smart people, passionate about cloud computing, data, and AI. We believe that world-class support is critical to customer success.
Key Job Responsibilities
- Handle Escalations : Serve as the primary escalation point for complex technical issues that are beyond the capabilities of the Tier 1 team.
- Proactive Management : Proactively monitor, manage, and optimize the cloud environments for a portfolio of managed service customers.
- Improvement Proposals : Continuously identify opportunities to enhance customer environments by proposing improvements related to cost optimization, security hardening, and performance tuning.
- Customer Stakeholder Management : Act as a key technical contact for customer stakeholders, participating in regular reviews to discuss operational performance, upcoming changes, and new initiatives.
- Troubleshooting : Apply advanced troubleshooting techniques to diagnose and resolve issues across cloud infrastructure, networking, security, and especially data and AI services.
- Data & AI Operations : Provide operational support for data pipelines, ETL / ELT jobs, machine learning model deployments, and AI APIs, ensuring their stability and performance.
- Automation : Develop and maintain scripts and automation to streamline operations, reduce manual tasks, and improve overall efficiency.
- Mentorship : Coach and mentor Tier 1 CloudOps Engineers, sharing knowledge and providing guidance on complex issues.
- Documentation : Create and maintain detailed technical documentation, including runbooks, standard operating procedures, and knowledge base articles.
A Day in the Life
Deep-dive into a complex networking issue escalated from Tier 1, collaborating with both the customer and the cloud provider to find a solution.Proactively review a customer's environment, identifying a few idle resources that could be shut down to reduce costs and drafting a proposal to present to the customer.Attend a virtual meeting with a managed service customer to provide an update on their operational health and discuss a planned change to their data pipeline.Investigate an alert on a failed machine learning model deployment, troubleshooting the underlying issue and working with the data science team to get it back online.Write a Python script to automate a common data transfer task, then add the script to the team's shared repository.Work with leadership to define and implement new processes to improve the efficiency of the operations team.We promote advancement opportunities horizontally and vertically across the organization to help you meet your career goals. We offer programs to help you acquire certification and develop the skills required to be successful in your role.
Basic Qualifications
Bachelor's degree in computer science, information technology, or a related field.At least 4 years of hands-on experience with any one of the major CSPs (Google Cloud, AWS, Azure, Alibaba Cloud).Professional-level certification in at least one of the major CSPs (e.g., Google Cloud Professional Cloud Architect / DevOps Engineer, AWS Professional, Azure Professional).Strong understanding of core cloud computing concepts, including networking, security, compute, and storage.Proven ability to troubleshoot and resolve complex technical issues independently.Hands-on experience with Infrastructure as Code (IaC) tools such as Terraform or Ansible.Foundational knowledge of data and AI concepts, including data pipelines, ETL / ELT processes, and machine learning model deployment.Proficiency in one or more scripting languages (e.g., Python, Go, Bash).Exceptional communication skills and proven customer-facing experience, with the ability to manage technical stakeholders.Excellent written and verbal communication skills in English. Multilingual ability (English, Mandarin, Malay / Indonesian, other Southeast Asian languages) is an added advantage.Preferred Qualifications
Experience in a managed services or consulting role.Proven experience with a variety of data services (e.g., BigQuery, Dataproc, Vertex AI, SageMaker, EMR).Experience with logging and monitoring platforms (e.g., Grafana, Cloud Logging, Datadog).Experience with CI / CD tools and concepts.Knowledge or experience of Site Reliability Engineering (SRE) principles.Seniority level
Mid-Senior levelEmployment type
Full-timeJob function
Information TechnologyIndustries
IT Services and IT ConsultingReferrals increase your chances of interviewing at CloudMile by 2x
#J-18808-Ljbffr