Overview
Cloud / DevOps Engineer with scripting proficiency (e.g., Python, Bash, or PowerShell); Go / Rust is a plus. Strong expertise in Terraform, Terragrunt, Helm, Kubernetes, and Docker. About Company
Groundup.ai is a Singapore-based AI startup that helps companies reduce unplanned downtime of industrial assets without needing a huge learning curve or high-risk deployments on the ground. Responsibilities Architect and manage scalable, secure infrastructure on GCP, Azure, and occasionally OCI / AWS.
Implement and manage Infrastructure as Code (IaC) primarily using Terraform and occasionally with Terragrunt and Helm.
Design and optimize CI / CD workflows using GitHub Actions, Jenkins, and GitHub Enterprise (reusable workflows, OIDC federation).
Ensure seamless deployment pipelines from code commit to production for microservices and AI workloads.
Manage Docker containers using tools such as Portainer and Docker images; support canary releases, blue-green deployments, and auto-scaling strategies.
Implement and manage serverless deployments on Google Cloud Platform (Cloud Functions, Cloud Run).
Resource planning and hardware estimation for both on-premise and cloud environments (based on sensors, storage needs).
Ensure robust backup strategies and data redundancy; audit on-cloud and on-premises resources.
Security & compliance : enforce cloud security best practices (image hardening, secret management, IAM least privilege, SBOMs, vulnerability scanning) and collaborate on SOC 2 / ISO 27001 requirements; respond to audits and incidents proactively.
Configure and manage Cloudflare for security and performance; build and maintain observability stacks (Grafana, Prometheus, Loki, Tempo, Datadog, OpenTelemetry, Sentry).
Diagnose and resolve performance bottlenecks across compute, storage, and networking layers; monitor and optimize cloud spending for cost-efficiency.
Develop and implement disaster recovery plans with regular drills to ensure business continuity.
Partner with engineers to embed DevOps best practices and establish documentation standards for infrastructure, processes, and troubleshooting guides.
Use Plane for sprint planning, incident tracking, and delivery visibility.
#J-18808-Ljbffr
Engineer • Ipoh, Malaysia