Job Purpose
Reliability Engineer (PRE) is responsible for engineering, operating, and maintaining internal container platform and its supporting infrastructure, with a strong focus on reliability, resiliency, and security. As a Senior PRE within the Infrastructure team you will play a pivotal role in designing, building, and operating distributed container hosting solutions.
The Job
- As a Senior Platform Reliability Engineer, you will play a key role in maintaining the stability, reliability, and efficiency of the internal container platform and its supporting infrastructure.
 - Your responsibilities will include core operational tasks such as resource provisioning and management, responding to platform and application outages, capacity planning, monitoring, and driving reliability enhancements.
 - You will continuously evaluate the platform’s technical architecture to ensure it scales effectively with evolving application demands.
 - This includes proactively identifying and resolving reliability issues, analyzing product dependencies, pinpointing performance bottlenecks, and implementing optimization strategies to enhance platform availability and cost efficiency.
 - In this role you will participate in a 24 / 7 on‑call rotation, promptly addressing alerts from the global monitoring team and resolving production incidents to maintain platform and application uptime.
 - You will regularly review team workflows to identify manual processes and implement automation solutions that reduce effort and minimize human error.
 - Regularly review the security advisory issued by Broadcom related to the Tanzu suite of products and deploy product updates as required to keep the platform vulnerable‑free.
 - Work with open‑source technologies, CI / CD, and SCM tools such as Bitbucket, implementing organization containers (e.g., Docker and Kubernetes). Stay current with industry trends and propose new ways for our business to improve.
 - Take accountability for business and regulatory compliance risks and take appropriate steps to mitigate them.
 - Maintain awareness of industry trends on regulatory compliance, emerging threats, and technologies to safeguard the company.
 - Highlight any potential concerns or risks and proactively share best practices.
 
Seniority Level
Mid‑Senior level
Employment Type
Full‑time
Job Function
Information Technology
Industries
IT Services and IT Consulting
#J-18808-Ljbffr