Senior DevOps & Site Reliability Engineering

NEXALINK MARKETING SDN. BHD.Kuala Lumpur, Kuala Lumpur, Malaysia

21 hours ago

Job description

We are looking for a highly skilled Cloud Platform & DevOps Engineer (云平台与DevOps工程师) to ensure the reliability, scalability, and security of our global cloud infrastructure. In this role, you will design, build, and optimize mission-critical systems while driving automation, observability, and DevSecOps best practices. This position requires strong technical leadership, hands‑on experience in large‑scale production environments, and a proactive mindset toward system resilience and performance excellence.

Key Responsibilities

Cloud Infrastructure & Operations

Manage and optimize high‑availability production environments across GCP and AWS , including operating systems (Windows / Linux), middleware, and distributed systems.

System Reliability

Lead capacity planning, performance tuning, root‑cause analysis, and preventive maintenance to ensure optimal uptime and stability.

Security & Compliance

Oversee data protection, disaster recovery, log auditing, and compliance (ISO27001, PCI-DSS), while implementing enterprise‑grade security frameworks.

Automation & Tooling

Develop and enhance CI / CD pipelines , Infrastructure‑as‑Code (IaC) frameworks, and monitoring tools using Terraform, Ansible, Puppet , or similar technologies.

SRE Practices

Design and implement multi‑cluster operations, service mesh (Istio), and observability solutions (Prometheus, Grafana) to improve fault detection, response, and recovery.

DevOps Culture

Champion DevOps / DevSecOps principles to streamline delivery processes, reduce manual workloads, and increase engineering efficiency.

Establish robust monitoring, alerting, and escalation workflows to maintain 24 / 7 system availability and minimize downtime.

Partner with cross‑functional teams and mentor engineers to drive operational excellence, collaboration, and continuous improvement.

Qualifications

Bachelor’s degree or above in Computer Science, Information Security , or related fields.

8+ years of experience in system operations, SRE, or infrastructure engineering.

Proven expertise in multi‑cloud and hybrid‑cloud architectures (GCP, AWS) , including deployment and migration.

Strong command of containers and Kubernetes , with hands‑on experience in service mesh (Istio) , multi‑cluster management, and autoscaling.

Proficient in CI / CD , IaC , and automation scripting ( Shell, Python, Golang ).

Deep understanding of networking, storage, compute, and security architectures .

Strong knowledge of TCP / IP, routing, and network security , with experience designing and troubleshooting complex environments.

Excellent problem‑solving, communication, and crisis management skills, with the ability to perform under pressure.

Fluent in Mandarin and English , able to collaborate in a multicultural and cross‑border setting.

Relevant certifications (e.g., CNCF, CKA / CKS ) are a strong plus.

Why Join Us

Global Exposure – Collaborate with international teams and lead large‑scale infrastructure projects across multiple regions, gaining global technical experience and perspective.

Career Development – Expand your expertise and leadership capabilities in a fast‑paced, innovation‑driven environment with structured growth opportunities.

Attractive Compensation – Enjoy a competitive salary with performance‑based quarterly bonuses , comprehensive benefits, and additional perks upon confirmation.

Professional Culture – Thrive in a structured, supportive, and growth‑oriented workplace that values technical excellence, collaboration, and continuous learning .

#J-18808-Ljbffr

Create a job alert for this search

Senior Engineering • Kuala Lumpur, Kuala Lumpur, Malaysia