Talent.com
Site Reliability Engineer - Incident Commander
Site Reliability Engineer - Incident CommanderSiemens Digital Industries Software • Bayan Lepas, Penang, Malaysia
Site Reliability Engineer - Incident Commander

Site Reliability Engineer - Incident Commander

Siemens Digital Industries Software • Bayan Lepas, Penang, Malaysia
30+ days ago
Job description

Site Reliability Engineer - Incident Commander

Join to apply for the Site Reliability Engineer - Incident Commander role at Siemens Digital Industries Software .

Siemens Digital Industries Software is a leading provider of solutions for the design, simulation, and manufacture of products across many industries, enabling innovation for everything from Formula 1 cars to space exploration vehicles.

In this crucial role, you will develop cutting‑edge automated solutions to support our best‑in‑class cloud infrastructure, particularly for the Siemens Xcelerator platform. When incidents arise, you will coordinate major incident response, ensuring rapid resolution and seamless communication with partners during service‑impacting events, while upholding strict SLAs.

Key Responsibilities

  • Incident Management : Act as the primary point of contact and leader during major incidents, coordinating response, communication and resolution across all involved teams.
  • Incident Response : Quickly assess severity, determine impact and drive appropriate response to restore services as quickly as possible.
  • Communication : Ensure clear, concise and timely communication with stakeholders throughout the incident lifecycle.
  • Post‑Incident Analysis : Lead reviews to identify root causes, drive improvements and implement preventive measures.
  • Collaboration : Work closely with SRE, DevOps, Development and other teams to continuously improve incident management processes.
  • Training & Preparedness : Conduct regular incident response drills and train teams to handle high‑severity incidents.
  • Documentation : Maintain and update incident management documentation.
  • Monitoring & Alerts : Define and refine alerting criteria to detect and escalates incidents promptly.
  • Continuous Improvement : Find opportunities to improve system reliability, scalability and performance based on lessons learned from incidents.
  • 24x7 On‑call rotation : Participate in 24x7 on‑call rotation.

Qualifications

  • Familiar with cloud infrastructure (AWS, GCP, Azure), containerization (Docker, Kubernetes) and automation scripting (Python, Bash).
  • Experience with incident management platforms (Jira Service Management, ServiceNow), monitoring tools (Datadog, Grafana) and on‑call systems (PagerDuty).
  • Proven ability to rapidly assess, troubleshoot and resolve complex incidents in distributed enterprise IT environments.
  • Demonstrated leadership in incident response, managing cross‑functional teams and aligning with business stakeholders.
  • Outstanding English communication skills, both verbal and written.
  • Skilled in defining, tracking and utilizing incident metrics (MTTR, MTTD) to drive accountability and continuous improvement.
  • Excellent troubleshooting and problem‑solving skills, with the ability to quickly analyze complex systems.
  • Highly motivated to continuously learn new technologies and adapt to evolving trends, with availability to work required core hours.
  • We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, sex, gender expression, sexual orientation, age, marital status, veteran status, or disability status.

    We offer a comprehensive reward package which includes a competitive basic salary, bonus scheme, generous holiday allowance, pension and private healthcare.

    #J-18808-Ljbffr

    Create a job alert for this search

    Reliability Engineer • Bayan Lepas, Penang, Malaysia

    Related jobs
    Systems Reliability Engineer

    Systems Reliability Engineer

    Serve Robotics • PenangMalaysia, Penang, Malaysia
    At Serve Robotics, we’re reimagining how things move in cities.Our personable sidewalk robot is our vision for the future. It’s designed to take deliveries away from congested streets, make deliveri...Show more
    Last updated: 27 days ago • Promoted
    Lead Engineer, Reliability

    Lead Engineer, Reliability

    Celestica Inc. • Bayan Lepas, Penang, Malaysia
    Press Tab to Move to Skip to Content Link.Select how often (in days) to receive an alert : .Design Engineering Systems Applications. Performs tasks such as, but not limited to, the following : .Design, ...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Canonical • George Town, Penang, Malaysia
    Site Reliability Engineer role at Canonical.We deploy and run OpenStack, Kubernetes, storage solutions, and open source applications, applying DevOps practices. To succeed in this role, you need to ...Show more
    Last updated: 30+ days ago • Promoted
    Lead Engineer - Reliability, Hardware Design (Storage / Server / Networking)

    Lead Engineer - Reliability, Hardware Design (Storage / Server / Networking)

    Celestica Inc. • George Town, Penang, Malaysia
    Lead Engineer - Reliability, Hardware Design (Storage / Server / Networking).Performs tasks such as, but not limited to, the following : . Responsible for reliability test validation of IT equipment inc...Show more
    Last updated: 30+ days ago • Promoted
    Continuous Improvement Leader

    Continuous Improvement Leader

    Plexus Malaysia Sdn Bhd • PenangMalaysia, Penang, Malaysia
    Continuous Improvement Leader page is loaded## Continuous Improvement Leaderlocations : Penang, Malaysiatime type : Full timeposted on : Posted 30+ Days Agojob requisition id : R • • • •Purpose St...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer, Compliance

    Senior Site Reliability Engineer, Compliance

    CoinGecko • Central George Town, Penang, Malaysia
    Senior Site Reliability Engineer, Compliance.Senior Site Reliability Engineer, Compliance.Direct message the job poster from CoinGecko. HR @ CoinGecko • Building the Best Place to Empower the Decent...Show more
    Last updated: 20 days ago • Promoted
    Site Engineer (Northern)

    Site Engineer (Northern)

    Solarvest • Bukit Mertajam, Penang, MY
    Quick Apply
    Supervise and manage all on-site activities to ensure smooth project execution and adherence to timelines.Communicate effectively with clients and subcontractors to address project requirements and...Show more
    Last updated: 30+ days ago
    Site Reliability Engineer (Mandarin Speaker) - PJ / Penang / JB

    Site Reliability Engineer (Mandarin Speaker) - PJ / Penang / JB

    Devopshunt • PenangMalaysia, Penang, Malaysia
    We are looking for highly motivated and results-oriented SREs with a strong foundation in SRE principles and a passion for building and maintaining reliable systems. The ideal candidates will posses...Show more
    Last updated: 30+ days ago • Promoted
    Site Reliability Engineering Manager

    Site Reliability Engineering Manager

    Canonical • Batu Kawan, Penang, Malaysia
    Site Reliability Engineering Manager.This role is based remotely in the APAC region.The Information Systems team at Canonical runs services used by over 60 million Ubuntu users.Our mission is to pi...Show more
    Last updated: 30+ days ago • Promoted
    Reliability Senior Lead Engineer 1

    Reliability Senior Lead Engineer 1

    Celestica Inc. • Bayan Lepas, Penang, Malaysia
    Press Tab to Move to Skip to Content Link.Senior Lead Engineer - Reliability, Hardware Design (Storage / Server / Network). Design Engineering Systems Applications.Show more
    Last updated: 30+ days ago • Promoted
    Senior Lead Engineer - Reliability, Hardware Design (Storage / Server / Network)

    Senior Lead Engineer - Reliability, Hardware Design (Storage / Server / Network)

    Celestica Inc. • George Town, Penang, Malaysia
    Design Engineering Systems Applications.Responsible for reliability test validation of IT equipment including server / storage / switch product. Work with Development team to ensure the products can mee...Show more
    Last updated: 30+ days ago • Promoted
    Web3 Senior Security Engineer

    Web3 Senior Security Engineer

    Hyphen Connect • George Town, Penang, Malaysia
    We are working with a decentralised exchange which looks to innovate on providing the best of CEXs and DEXs, focusing on building a safe, simple and scalable platform for trading.They differentiate...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer (L3)

    Senior Site Reliability Engineer (L3)

    Coingecko • Batu Kawan, Penang, Malaysia
    CoinGecko is a global leader in tracking cryptocurrency data.Operating since 2014, CoinGecko has built the world's largest cryptocurrency data platform, tracking over 10,000 tokens across more than...Show more
    Last updated: 7 days ago • Promoted
    Site Reliability Engineer - Incident Commander

    Site Reliability Engineer - Incident Commander

    Siemens AG • Bayan Lepas, Penang, Malaysia
    Siemens Digital Industries Software.Formula 1 cars, skyscrapers, ships, space exploration vehicles, and many of the objects we see in our daily lives are being conceived and manufactured using our ...Show more
    Last updated: 27 days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Canonical • Central George Town, Penang, Malaysia
    Canonical Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia.Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets.Our...Show more
    Last updated: 30+ days ago • Promoted
    Equipment Reliability Engineer — Hybrid / Remote

    Equipment Reliability Engineer — Hybrid / Remote

    Renesas Electronics • Seberang Perai, Penang, Malaysia
    A leading semiconductor solution provider in Malaysia is seeking an enthusiastic individual for equipment maintenance role. Responsibilities include resolving operational problems, managing maintena...Show more
    Last updated: 3 days ago • Promoted
    Site Reliability / Gitops Engineer

    Site Reliability / Gitops Engineer

    Canonical • Bayan Lepas, Penang, Malaysia
    Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiat...Show more
    Last updated: 30+ days ago • Promoted
    SR QLTY & RELIABILITY ENGINEER

    SR QLTY & RELIABILITY ENGINEER

    TE Connectivity • PenangMalaysia, Penang, Malaysia
    SR QLTY & RELIABILITY ENGINEER.Implement quality policies, procedures and systems; maintain / update quality engineering work instructions in order to ensure the business procedures align with QS and...Show more
    Last updated: 30+ days ago • Promoted