Overview
At N1.Healthcare, we are transforming how individuals understand and manage their health. We are looking for a skilled AI Engineer to join our fast-growing team and help drive the development of intelligent, healthcare-focused systems—from advanced diagnostic insights to real-time health analytics.
Responsibilities
- Architect High-Availability Infrastructure : Design, implement, and automate a sophisticated, multi-cloud infrastructure across Amazon Web Services (AWS) and Google Cloud Platform (GCP). You will leverage services such as AWS EC2, S3, RDS, and GCP Compute Engine, GKE, and Cloud SQL to build a foundation that ensures our AI-powered healthcare solutions are delivered with maximum speed, reliability, and security.
- Implement Proactive Monitoring and Security : Establish and manage a comprehensive monitoring and observability strategy using tools like Datadog, Prometheus, and Grafana. Your objective is to predict, detect, and neutralize system issues and security threats. You will implement security best practices and tooling (e.g., Falco, Trivy, AWS GuardDuty) to safeguard sensitive patient data and maintain clinical-grade security at every layer of the technology stack.
- Drive End-to-End Automation : Engineer and maintain robust CI / CD pipelines and Infrastructure-as-Code (IaC) frameworks. You will utilize leading technologies such as Terraform, Kubernetes (K8s), and GitHub Actions to empower our development teams to innovate and deploy rapidly without compromising system stability or security.
- Champion Operational Excellence : Lead incident response and resolution efforts, utilizing platforms like PagerDuty and Datadog. Conduct rigorous, blameless postmortems to identify root causes and drive preventative measures. You will collaborate closely with development teams through platforms like GitHub, Clickup and n8n to establish a feedback loop that fosters self-healing systems and continuous improvement.
- Engineer for Scalability and Performance : Design, test, and refine systems architected to support thousands of concurrent users and services. This includes implementing auto-scaling strategies and conducting performance analysis with load testing tools such as K6 to guarantee flawless performance, data integrity, and clinical safety at scale.
- Pioneer Advanced MLOps and AI Infrastructure : Go beyond basic model deployment. Engineer a sophisticated MLOps ecosystem to manage the complete lifecycle of our machine learning models. This includes building CI / CD pipelines for model training and validation, implementing advanced monitoring to detect data and concept drift, and developing frameworks for A / B testing and canary rollouts of new model versions. You will architect secure, high-throughput gateways for interacting with third-party LLM APIs from OpenAI, Anthropic, and Google Cloud\'s Vertex AI , focusing on cost optimization, caching strategies, and mitigating risks like prompt injection.
- Lead Technical Innovation and Optimization : Continuously evaluate and integrate emerging technologies to enhance our platform. Your responsibilities will range from hardening Linux operating systems to deploying advanced containerization solutions like Docker and Podman , and exploring service mesh technologies such as Istio to optimize our service-oriented architecture.
Core Qualifications and Experience
Cloud Platforms : Demonstrated expertise in AWS and GCP .AI / ML Operations (MLOps) : Proven experience in the operational management of the full machine learning lifecycle. This includes building ML-specific CI / CD pipelines, managing APIs from providers like OpenAI, Anthropic, and Vertex AI , and a deep understanding of tools and techniques for monitoring model performance, drift, and cost. Knowledge of the unique security and compliance challenges in production LLM systems is a significant plus.Containerization & Orchestration : Deep, hands-on experience with Docker and Kubernetes for deploying and managing scalable, containerized applications. Proficiency with Kubernetes ecosystem tools such as Helm for package management and service mesh technologies like Istio is highly desirable.Infrastructure as Code (IaC) : Strong background in IaC principles using Terraform for infrastructure provisioning.CI / CD : Proficiency in building and maintaining CI / CD pipelines, with specific experience in GitHub Actions being highly desirable.Monitoring & Security : Advanced proficiency with monitoring and observability platforms, particularly Datadog . Solid understanding of network security principles, including the implementation of firewalls and VPNs .Linux & Scripting : Advanced knowledge of Linux administration and strong scripting skills in Bash, Python , or other relevant languages.Networking & Databases : A deep understanding of TCP / IP networking concepts. Experience with relational database administration, preferably with PostgreSQL .Version Control : High proficiency with Git and modern branching strategies.Soft Skills
Strong analytical and critical thinking abilities.Excellent cross-functional communication and collaboration.Demonstrated ability to deliver under tight deadlines and troubleshoot high-pressure incidents.Preferred Qualifications
Degree in Computer Science, Information Systems, or any related field.Experience in site reliability engineering (SRE), high-availability architectures, or healthcare infrastructure.Familiarity with healthcare data regulations (HIPAA, GDPR) and maintaining compliance in cloud environments.Experience with documentation of processes and security / operational incidents.Exposure to application and infrastructure security standards and best practices.Why Join N1.Healthcare?
Help shape the future of AI in healthcare with real-world impact.Collaborate with a mission-driven, innovative team.Competitive salary and career growth opportunities.Work with cutting-edge technologies in a fast-scaling health tech environment.Join a team that\'s redefining how people access, understand, and improve their health through the power of AI.
#J-18808-Ljbffr