Talent.com
Platform Reliability Engineer

Platform Reliability Engineer

POWER IT SERVICESKuala Lumpur, Kuala Lumpur, Malaysia
12 jam yang lalu
Penerangan pekerjaan

Job Purpose

Reliability Engineer (PRE) is responsible for engineering, operating, and maintaining internal container platform and its supporting infrastructure, with a strong focus on reliability, resiliency, and security. As a Senior PRE within the Infrastructure team you will play a pivotal role in designing, building, and operating distributed container hosting solutions.

The Job

  • As a Senior Platform Reliability Engineer, you will play a key role in maintaining the stability, reliability, and efficiency of the internal container platform and its supporting infrastructure.
  • Your responsibilities will include core operational tasks such as resource provisioning and management, responding to platform and application outages, capacity planning, monitoring, and driving reliability enhancements.
  • You will continuously evaluate the platform’s technical architecture to ensure it scales effectively with evolving application demands.
  • This includes proactively identifying and resolving reliability issues, analyzing product dependencies, pinpointing performance bottlenecks, and implementing optimization strategies to enhance platform availability and cost efficiency.
  • In this role you will participate in a 24 / 7 on‑call rotation, promptly addressing alerts from the global monitoring team and resolving production incidents to maintain platform and application uptime.
  • You will regularly review team workflows to identify manual processes and implement automation solutions that reduce effort and minimize human error.
  • Regularly review the security advisory issued by Broadcom related to the Tanzu suite of products and deploy product updates as required to keep the platform vulnerable‑free.
  • Work with open‑source technologies, CI / CD, and SCM tools such as Bitbucket, implementing organization containers (e.g., Docker and Kubernetes). Stay current with industry trends and propose new ways for our business to improve.
  • Take accountability for business and regulatory compliance risks and take appropriate steps to mitigate them.
  • Maintain awareness of industry trends on regulatory compliance, emerging threats, and technologies to safeguard the company.
  • Highlight any potential concerns or risks and proactively share best practices.

Seniority Level

Mid‑Senior level

Employment Type

Full‑time

Job Function

Information Technology

Industries

IT Services and IT Consulting

#J-18808-Ljbffr

Buat amaran kerja untuk carian ini

Reliability Engineer • Kuala Lumpur, Kuala Lumpur, Malaysia