See all roles

Staff - Lead Site Reliability Engineer job at HeartFlow in San Francisco, CA

Work from home Full-time role Hiring

Staff/Lead Site Reliability Engineer (SRE) San Francisco, California Heartflow is a medical technology company advancing the diagnosis and management of coronary artery disease, the #1 cause of death worldwide, using cutting-edge technology. The flagship product—an AI-driven, non-invasive cardiac test supported by the ACC/AHA Chest Pain Guidelines called the Heartflow FFRCT Analysis—provides a color-coded, 3D model of a patient’s coronary arteries indicating the impact blockages have on blood flow to the heart. Heartflow is the first AI-driven non-invasive integrated heart care solution across the CCTA pathway that helps clinicians identify stenoses in the coronary arteries (RoadMap™Analysis), assess coronary blood flow (FFRCT Analysis), and characterize and quantify coronary atherosclerosis (Plaque Analysis). Our pipeline of products is growing and so is our team; join us in helping to revolutionize precision heartcare. Heartflow is a publicly traded company (HTFL) that has received international recognition for exceptional strides in healthcare innovation, is supported by medical societies around the world, cleared for use in the US, UK, Europe, Japan and Canada, and has been used for more than 500,000 patients worldwide. HeartFlow is transforming cardiovascular care with cutting-edge, non-invasive technology. We are launching a massive Platform Modernization initiative to power the next generation of our life-saving medical products. We're looking for an experienced Site Reliability Engineer (SRE) to join our cloud-native infrastructure team. You will work closely with our Platform engineers and development teams to ensure our critical systems are highly available, scalable, observable, and performant. If you thrive on eliminating toil, automating complex operations, and defining the standards for production excellence, we want to talk to you. Job Responsibilities As our Staff SRE, you'll be the primary expert responsible for our entire compute ecosystem. Your key responsibilities will include: As a Staff SRE, you'll operate at the highest level of technical expertise and influence. You won't just solve problems; you'll prevent them at a fundamental level across organizational boundaries. Lead the design, implementation, and operation of reliable, scalable cloud infrastructure Define and begin rollout of SLI/SLO standards across microservices Develop self-service instrumentation tooling enabling engineering teams to own observability Establish observability and monitoring using OSS toolchain Serve as a technical escalation point for critical incidents, perform deep-dive root cause analyses (RCAs), and implement robust corrective measures to prevent recurrence. Enhance our monitoring, logging, and tracing systems to provide comprehensive visibility into system health. Set the technical direction and best practices for the entire SRE and engineering organization. Mentor mid-level and senior engineers on design patterns, operational rigor, and reliability principles. We're looking for a leader and a deep technical expert with a proven track record of solving the hardest scaling and reliability challenges. Required Qualifications 8+ years of progressive experience in Site Reliability Engineering, Production Engineering, or a closely related role. Deep expertise with: AWS Kubernetes, Helm Observability stack (Prometheus, Grafana, Mimir, Loki, Pixie, Tempo) CI/CD systems (ArgoCD, Harness) Fluency in at least one major scripting/programming language for automation and tooling (e.g., Python, Go, or Java). Hands-on engineering mindset — able to instrument services directly, not just configure tooling Track record of building or significantly improving incident detection and response systems Have deep technical familiarity with Kubernetes ecosystems, containerization technologies, and modern IaC tooling (e.g., Terraform, Crossplane, or Operators) so you can effectively guide the team's technical decisions Exceptional communication skills, capable of explaining complex technical issues to both technical and non-technical audiences.

Nice-to-Have

Experience implementing Service Mesh technologies (e.g., Istio, Linkerd). A strong understanding of security principles and practices in a cloud environment. Certifications such as CKA (Certified Kubernetes Administrator) or CKAD (Certified Kubernetes Application Developer). A reasonable estimate of the base salary compensation range is $200,750 to $250,922, cash bonus, and equity. #LI-IB1 #LI-Hybrid Apply tot his job Apply To this Job

You might like

Site Reliability Engineer (USA Only - 100% Remote)

Work from home Full-time role

Site Reliability Engineer L4/L5 – Live Cloud Platform SRE

Work from home Full-time role

Senior Site Reliability Engineer – Compute Platforms

Work from home Full-time role

Network Engineer II (Remote/ Dallas TX)

Work from home Full-time role

Senior Site Reliability Engineer (Linux, Kubernetes, Go & Python)

Work from home Full-time role

Senior Site Reliability Engineer, Platform & Cloud FinOps

Work from home Full-time role

(SME)Senior Kubernetes Architecture Engineer

Work from home Full-time role

Delivery Cloud Network Engineer | Remote

Work from home Full-time role

Network Engineer - Consultant (Senior Cloud Network Engineer )

Work from home Full-time role

Systems Admin - RamQuest Admin (Pittsburgh, PA; Remote)

Work from home Full-time role

Bilingual Recruiter (100% Remote)

Work from home Full-time role

Project Manager, Data Center Transformer Services

Work from home Full-time role

Biostatistician (Biostatistics)

Work from home Full-time role

AWS Practice Architect AI/ML

Work from home Full-time role

Clinical Specialist II - Fort Wayne, IN

Work from home Full-time role

Experienced Call Center Representative – Remote Data Entry Work Opportunity at arenaflex

Work from home Full-time role

Experienced Customer Support Specialist – Remote Work Opportunities in Tech and Innovation at arenaflex

Work from home Full-time role

Full Stack Developer

Work from home Full-time role

Remote Part‑Time Customer Service Representative – Travel & Aviation Support – Join arenaflex’s Global Service Team from Home

Work from home Full-time role

Experienced Entry Level Data Entry Specialist – Remote Work Opportunity with arenaflex

Work from home Full-time role