Join to apply for the
AI Infrastructure Engineer
role at
YTL AI Labs
At YTL AI Labs, we build sovereign AI models that perform on par with the world’s best—while staying grounded in local needs, values, and context. Our flagship model, Ilmu, is designed to be culturally aware, contextually intelligent, and fluent in Bahasa Melayu, delivering cutting-edge solutions that empower Malaysian businesses with intelligence that truly understands the market and the people they serve.
As pioneers of sovereign AI, we believe every nation should have the power to shape its own intelligence—guided by its people, priorities, and principles.
About the Role As an
AI Infrastructure Engineer , you’ll build and maintain the systems that power our AI applications and model training. Your work will ensure our teams can run AI workloads reliably, securely, and at scale — across both cloud and on-prem environments.
This role combines DevOps, ML systems, and infrastructure engineering to support everything from LLMs to training pipelines.
What you’ll do
Build and maintain infrastructure for model hosting, prompt execution, and training workflows
Operate and optimize model-serving systems like vLLM, Triton, or OpenAI proxies
Implement secure API gateways and manage token usage, routing, and fallback
Develop and support training pipelines, GPU scheduling, and experiment tracking
Maintain CI / CD systems, observability tools, and infrastructure documentation
What We’re Looking For
6+ years' experience in infrastructure engineering, DevOps, or ML systems
Strong command of Kubernetes, Terraform, and cloud-native architecture (AWS, Azure, GCP)
Experience with containerization, CI / CD, and API security practices
Prior exposure to model hosting or ML pipeline orchestration
Understanding of networking concepts including VPNs, VNets, and hybrid connectivity
Familiarity with security best practices for cross-platform infrastructure
Experience with on-prem infrastructure including networking, storage hardware
Bonus points if you have
Experience with GPU resource orchestration or Kubeflow
Familiarity with inference servers like vLLM, Triton, TGI, or TorchServe
Understanding of cost telemetry and resource budgeting for model traffic
Security mindset and experience with IAM, logging, and compliance
Familiarity with compliance frameworks (SOC2, GDPR, HIPAA) and implementing controls
Background in database management across different platforms
If you’re looking to do meaningful work with people who care about how we get there — we’d love to meet you. Apply now!
#J-18808-Ljbffr
Infrastructure Engineer • Kuala Lumpur, Malaysia