Talent.com
This job offer is not available in your country.
Site Reliability Engineer, Principal

Site Reliability Engineer, Principal

AIAAIA Malaysia, Kuala Lumpur, MY
30+ days ago
Job description

At AIA we’ve started an exciting movement to create a healthier, more sustainable future for everyone.

If you believe in developing a better tomorrow, read on.

About the Role

System Reliability Engineer (SRE) is responsible to ensure our cloud application systems are reliable and available to users. The SRE will supervise application systems and establish automated detections, root cause analysis, and formulate preventive actions. They will gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding. They will partner with development teams to improve services.

Functional Duties :

Set up and maintain monitoring of infrastructure and application

Build alerts and auto recovery for various operational issues

Capture and analyze metrics from operating systems as well as applications

Advise in performance tuning and fault finding

Partner with development teams to improve services

Assist in formulating preventive actions where possible, lead potential failure scenarios studies and formulate automated recovery methods

Comfortable with working on new tools e.g., Azure DevOps, Grafana, ELK, Dynatrace

People Management Duties :

Train and mentor other consultants or teammates on your specialties

Be the advisor toward applications and assist application team establish recovery processes

Requirements :

Tertiary qualification in Computer Science or any other relevant education

Programming Languages : Java 8 or above (must have)

Experience in developing and optimizing stored procedures for MySQL and MSSQL databases

OS : Linux(RHEL or SUSE) or Windows Server

Scripting (must have any one of them) : Shell, Bash, Powershell

Knowledge in open-source distributed version control system, git

Sound knowledge of how REST API works

Experience in Atlassian tools (e.g., Jira, Bitbucket, Confluence)

Familiarity with Azure Cloud services

Working experience with ITIL in Agile environment

Good to have :

Experience with Python programming language

Experience with containerization (Docker, AKS, ACR, EKS, ECS)

Experience in CICD with Azure DevOps

Experience in Dashboard development with Grafana, Azure Monitor, or Dynatrace

Experience in infrastructure management with Terraform or Ansible

Experience with Azure or AWS cloud certification would be an added advantage

Create a job alert for this search

Reliability Engineer • AIA Malaysia, Kuala Lumpur, MY