Talent.com
This job offer is not available in your country.
X9546VV3 |【中文岗】Senior Operations Engineer (SRE / AI Platform) 高级运维工程师(SRE / 人工智能平台)

X9546VV3 |【中文岗】Senior Operations Engineer (SRE / AI Platform) 高级运维工程师(SRE / 人工智能平台)

TTUKofferKuala Lumpur, Malaysia
23 days ago
Job type
  • Quick Apply
Job description
  • 工作地点:吉隆坡 KL
  • 薪资范围:RM14,700 - RM17,700
  • 工作签证:不提供
  • 职位亮点

    • 加入全球领先的AI基础设施服务提供商的国际团队,参与构建和运维尖端AI平台。
    • 独立负责全球用户的生产环境,直接影响核心服务的可靠性与性能。
    • 深度接触多云架构、GPU计算和自动化运维,积累高价值技术经验。
    • 跨文化协作环境,与中美技术团队紧密合作,提升中英文双语技术沟通能力。
    • 核心职责

    • 端到端运维 ownership:全面负责AI基础设施产品(Model-API、Serverless、GPU实例)的可用性、延迟、性能与效率。
    • 故障响应与管理:作为生产事件第一响应人,深入排查根因(RCA),实施预防措施,并参与轮值待命。
    • 自动化与工具开发:设计和维护自动化脚本与工具,实现运维任务、部署和故障恢复的流程化。
    • 监控与告警体系:构建并优化监控告警系统(如Prometheus / Grafana),实现问题主动发现。
    • 基础设施即代码(IaC):使用Terraform / Ansible等工具管理云基础设施,保障环境一致性与可重复性。
    • 性能与成本优化:持续分析系统性能与资源使用,识别瓶颈并优化云平台(AWS / GCP / Azure)成本。
    • 跨职能协作:与中方工程团队密切合作,理解新功能、提供运维反馈,并确保新服务达到生产就绪状态。
    • 硬性要求

    • 5年以上DevOps / SRE / 云运维经验,有科技或云服务公司背景优先。
    • 精通至少一家主流云平台(AWS / GCP / Azure);具备容器化与编排技术实战经验(必须掌握Docker / Kubernetes)。
    • 熟练使用至少一种脚本语言(如Python / Go / Shell);掌握Terraform / Ansible等IaC工具。
    • 具备监控与可观测性工具(如Prometheus / Grafana / ELK)的实战经验。
    • 系统化的问题排查能力,能在压力下冷静处理复杂分布式系统问题。
    • 中英文双语流利(书面和口语),能胜任跨团队技术沟通。
    • 具备高度责任心和自驱力,适应远程 / 分布式团队独立工作模式。
    • 加分项:有GPU加速计算环境经验;熟悉MLOps工具(如Kubeflow / MLflow);了解Serverless技术及CI / CD流水线。
    • 如何申请?

      点击'Apply'申请或发送简历至[apply@ttukoffer.co.uk] ,邮件标题注明[申请 WBX9546VV3]。推荐奖金:成功推荐人选可获得推荐奖励。详情: https : / / ttukoffer.co.uk / refer-a-friend-bonus /

      [Mandarin-speaking Role] Senior Operations Engineer (SRE / AI Platform)

    • Location : Kuala Lumpur
    • Compensation : RM10,000 - RM15,000
    • Visa Sponsorship : Not Available
    • Job Highlights

    • Join the international team of a leading global AI infrastructure service provider to build and operate cutting-edge AI platforms.
    • Take end-to-end ownership of production environments for global users, directly impacting core service reliability and performance.
    • Gain deep exposure to multi-cloud architecture, GPU computing, and automated operations in a high-impact role.
    • Collaborate in a multicultural environment with engineering teams across China and North America, enhancing bilingual technical communication skills.
    • Key Responsibilities

    • End-to-End Service Ownership : Assume primary responsibility for the availability, latency, performance, and efficiency of AI infrastructure products (Model-API, Serverless, GPU Instances).
    • Incident Management & Response : Act as the first responder for production incidents, perform root cause analysis (RCA), and implement preventive measures. Participate in an on-call rotation.
    • Automation & Tooling : Design, build, and maintain automation scripts and tools to streamline operational tasks, deployments, and failure recovery.
    • Monitoring & Alerting : Develop and refine monitoring and alerting systems (e.g., Prometheus / Grafana) to enable proactive issue detection.
    • Infrastructure as Code (IaC) : Manage and provision cloud infrastructure using IaC tools (e.g., Terraform, Ansible) to ensure consistency and repeatability.
    • Performance & Cost Optimization : Continuously analyze system performance and resource utilization to identify bottlenecks and optimize cloud platform (AWS / GCP / Azure) costs.
    • Cross-Functional Collaboration : Work closely with engineering teams in China to understand new features, provide operational feedback, and ensure production readiness of new services.
    • Must-Have Requirements

    • 5+ years of hands-on experience in DevOps, SRE, or cloud operations, preferably in a tech or cloud service company.
    • Expertise in at least one major cloud provider (AWS / GCP / Azure); practical experience with containerization and orchestration technologies (Docker / Kubernetes required).
    • Proficiency in at least one scripting language (e.g., Python, Go, Shell); solid understanding of IaC tools like Terraform / Ansible.
    • Hands-on experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK Stack).
    • Systematic problem-solving skills with the ability to troubleshoot complex distributed systems under pressure.
    • Professional fluency in both English and Mandarin (written and spoken) for effective cross-regional collaboration.
    • Strong sense of ownership and self-drive, with the ability to work independently in a remote / distributed team setting.
    • Nice to Have : Experience with GPU-accelerated computing; knowledge of MLOps tools (e.g., Kubeflow, MLflow); familiarity with serverless technologies and CI / CD pipelines.
    • How to Apply?

      Click 'Apply' or send your resume to [apply@ttukoffer.co.uk] with the subject line [Apply to WBX9546VV3]. Refer a friend for this role and earn referral bonuses! See details : https : / / ttukoffer.co.uk / refer-a-friend-bonus /

      By applying, you acknowledge that TT UKoffer Ltd may process your personal data for recruitment purposes under the lawful basis of legitimate interest. This includes sharing your CV with potential employers. We comply with UK GDPR regulations, and you may request data removal at any time by contacting apply@ttukoffer.co.uk.

    Create a job alert for this search

    Platform Engineer • Kuala Lumpur, Malaysia

    Related jobs
    Senior Software Engineer II

    Senior Software Engineer II

    ZALORA SOUTH EAST ASIA PTE LTDKuala Lumpur, MY
    Quick Apply
    Responsibilities : Work closely with Product and UX Research teams to design and develop features for the iOS platform.Collaborate with back-end developers to enhance usability,...Show moreLast updated: 30+ days ago
    • Promoted
    Full Stack Engineer AI (Remote)

    Full Stack Engineer AI (Remote)

    ASPEN - Bjak Sdn BhdKuala Selangor, Malaysia
    Working arrangement : Remote - remote in Vietnam Build Intelligent Systems from Model to UI - and Everything in Between At BJAK, we're using AI to reinvent how financial services work across Southea...Show moreLast updated: 3 days ago
    • Promoted
    • New!
    Senior AI Reporting Engineer

    Senior AI Reporting Engineer

    Razer Inc.Kuala Lumpur, Kuala Lumpur, Malaysia
    Bangsar South, Federal Territory of Kuala Lumpur, Malaysia.Job title : Senior Data Analyst – Risk and Compliance (Razer Gold). Razer Gold aims to become the world’s leading payment service provider f...Show moreLast updated: 11 hours ago
    • Promoted
    System DevOps Engineer

    System DevOps Engineer

    iSoftStoneKuala Lumpur, Kuala Lumpur, Malaysia
    Direct message the job poster from iSoftStone.Project Coordinator | Recruitment | Management | Content Creator | Video Editing | HR | PA. Participate in basic management, setup, and maintenance of p...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Associate, Digital & AI Deployment

    Senior Associate, Digital & AI Deployment

    MRANTIKuala Lumpur, Kuala Lumpur, Malaysia
    Senior Associate, Digital & AI Deployment.MRANTI, Federal Territory of Kuala Lumpur, Malaysia.As MRANTI progresses with its Digital Transformation initiative, numerous projects will be initiated to...Show moreLast updated: 22 days ago
    • Promoted
    • New!
    DevOps Engineer

    DevOps Engineer

    N1 HealthcareCyberjaya, Selangor, Malaysia
    Healthcare, we are transforming how individuals understand and manage their health.Architect High-Availability Infrastructure : . Design, implement, and automate a sophisticated, multi-cloud infrastru...Show moreLast updated: 17 hours ago
    • Promoted
    DevOps Engineer

    DevOps Engineer

    Seedify IncSubang Jaya, Malaysia
    Seedworld Studios is looking for an experienced AWS DevOps professional who can design and implement secure, scalable, and highly available cloud solutions. The ideal candidate will have hands-on ex...Show moreLast updated: 3 days ago
    • Promoted
    • New!
    Senior Machine Learning Engineer / Senior (Gen) AI Engineer

    Senior Machine Learning Engineer / Senior (Gen) AI Engineer

    EvonikKuala Lumpur, Kuala Lumpur, Malaysia
    What we offer • •At Evonik, one of the world’s leading specialty chemicals companies, you can expect an (IT) world full of possibilities : Our IT team, with nearly 1,000 employees worldwide, stands fo...Show moreLast updated: 11 hours ago
    • Promoted
    Senior iOS Engineer (Malaysia Remote)

    Senior iOS Engineer (Malaysia Remote)

    GoodNotes LimitedKepong, Malaysia
    Asia Time Zone At Goodnotes, we believe that every individual holds untapped potential waiting to be unleashed.By reimagining the way we interact with information, we’re merging human creativity wi...Show moreLast updated: 3 days ago
    • Promoted
    • New!
    AIOps Engineer

    AIOps Engineer

    RazerShah Alam, Selangor, Malaysia
    Joining Razer will place you on a global mission to revolutionize the way the world games.LifeAtRazer experience that will put you in an accelerated growth, both personally and professionally.AIOps...Show moreLast updated: 11 hours ago
    Senior DevOps Engineer

    Senior DevOps Engineer

    Flintex Consulting Pte LtdKuala Lumpur, 14, my
    Quick Apply
    We are seeking a highly skilled and motivated senior DevOps Engineer to join our team.The ideal candidate will be responsible for designing, implementing, and maintaining our CI / CD pipelines, autom...Show moreLast updated: 30+ days ago
    Sr. DevOps Engineerr

    Sr. DevOps Engineerr

    Two95 International Inc.Kuala Lumpur, Federal Territory of Kuala Lumpur, MY
    Quick Apply
    This is a key role that should have the engineering knowledge, production experience and hands-on implementation ability. You will contribute in areas such as : .Ensure the highest levels of our syste...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    AIOps Engineer

    AIOps Engineer

    Razer Inc.Shah Alam, Selangor, Malaysia
    AIOps Engineer page is loaded## AIOps Engineerlocations : Shah Alamtime type : Full timeposted on : Posted Todayjob requisition id : JR Joining Razer will place you on a global mission to revol...Show moreLast updated: 17 hours ago
    • Promoted
    Full Stack Engineer AI (Remote)

    Full Stack Engineer AI (Remote)

    Bjak Sdn BhdSeremban, Malaysia
    Working arrangement : Remote - remote in Vietnam Build Intelligent Systems from Model to UI - and Everything in Between At BJAK, we're using AI to reinvent how financial services work across Southea...Show moreLast updated: 3 days ago
    • Promoted
    • New!
    Senior AI Reporting Engineer

    Senior AI Reporting Engineer

    RazerKuala Lumpur, Kuala Lumpur, Malaysia
    Joining Razer will place you on a global mission to revolutionize the way the world games.LifeAtRazer experience that will put you in an accelerated growth, both personally and professionally.Razer...Show moreLast updated: 11 hours ago
    Lead / Senior DevOps Engineer

    Lead / Senior DevOps Engineer

    Two95 International Inc.Kuala Lumpur, Federal Territory of Kuala Lumpur, MY
    Quick Apply
    We currently have an opening for a DevOps.You will cooperate with interdisciplinary teams in projects.Maintain a secure and reliable infrastructure for delivery services. Operate and maintaining pro...Show moreLast updated: 30+ days ago
    Senior DevOps Engineer

    Senior DevOps Engineer

    Talent SwitchSerdang, Selangor, Malaysia
    Quick Apply
    Were looking for a Senior DevOps Engineer with 5+ years of experience in DevOps and a passion for scalable, secure, and automated cloud infrastructure. You'll play a key role in delivering our SaaS ...Show moreLast updated: 6 days ago
    • Promoted
    Senior Specialist, DevOps

    Senior Specialist, DevOps

    TNG DigitalKuala Lumpur, Kuala Lumpur, Malaysia
    Let's connect - We're hiring! | Fintech | Openings in both IT and non-IT fields.We fuel the ideas and ambitions of our people with an environment built on Our DNA of Love, Entrepreneurship, Agility...Show moreLast updated: 22 days ago
    • Promoted
    Unix AIX Engineer

    Unix AIX Engineer

    OCBCCyberjaya, Selangor, Malaysia
    Provide Level 3 support for IBM AIX operating systems and related technologies.Troubleshoot and resolve complex hardware and software issues on IBM AIX systems. Perform system administration tasks s...Show moreLast updated: 22 days ago
    • Promoted
    AI Solutions Engineer

    AI Solutions Engineer

    Academy of Artificial IntelligencePuchong, Selangor, Malaysia
    At Academy of AI, we’re on a mission to make AI practical, intuitive, and accessible.Partner with designers to turn user needs into clear flows, prototypes, and delightful AI experiences.Test, iter...Show moreLast updated: 8 days ago