Cloud / DevOps Engineering, Proficient in scripting (e.g., Python, Bash, or PowerShell); Go / Rust is a plus, Strong expertise in Terraform, Terragrunt, Helm, Kubernetes, and Docker
About Company
Groundup.ai is a Singapore-based AI startup that helps companies to reduce unplanned downtime of industrial assets without needing a huge learning curve and high-risk deployments on the ground.
Job Description
- Architect and manage scalable, secure infrastructure on GCP, Azure, and occasionally OCI / AWS.
- Implement and manage Infrastructure as Code (IaC) primarily using Terraform and occasionally with Terragrunt, and Helm.
- Design and optimize CI / CD workflows using GitHub Actions, Jenkins, and GitHub Enterprise (reusable workflows, OIDC federation).
- Ensure seamless deployment pipelines from code commit to production for microservices and AI workloads.
- Manage Docker containers using tools such as Portainer, Docker Image.
- Support canary releases, blue-green deployments, and auto-scaling strategies.
- Implement and manage serverless deployments on Google Cloud Platform (Cloud Functions, Cloud Run).
Resource Planning & Hardware Estimation
Assist in hardware estimation for both on-premise and cloud environments, based on resource requirements such as the number of sensors and storage needs.Ensure robust backup strategies and data redundancy for all infrastructure.Assist the team in auditing the on-cloud and on-premises resources.Security & Compliance
Enforce cloud security best practices : image hardening, secret management, IAM least privilege, SBOMs, and vulnerability scanning.Collaborate on compliance requirements (SOC 2, ISO 27001), and respond to audits and incidents proactively.Configure and manage Cloudflare for enhanced security and performance.Build and maintain observability stacks using Grafana, Prometheus, Loki, Tempo, Datadog, OpenTelemetry, and Sentry.Diagnose and resolve performance bottlenecks across compute, storage, and networking layers.Monitor and optimize cloud spending to ensure cost-efficiency.Develop and implement disaster recovery plans, conducting regular drills to ensure business continuity.Partner with engineers to embed DevOps best practices.Establish and enforce documentation standards for infrastructure, processes, and troubleshooting guides.Use Plane for sprint planning, incident tracking, and delivery visibility.#J-18808-Ljbffr