Software Engineer (AI/ML, Infrastructure & Platform)
• We are seeking a Software Engineer, AI/ML (Infrastructure & Platform) to build the foundational systems that power our next generation of AI applications
- This is a systems-focused role. You will design and build the platforms, abstractions, and infrastructure that enable teams to reliably develop, deploy, and scale AI systems — including agentic workflows, retrieval pipelines, and model integrations
- You will operate at the intersection of AI systems and distributed infrastructure, focusing on the “how” behind production AI: how models are orchestrated, how tools/skills are exposed and executed, and how systems are evaluated, monitored, and scaled in real-world environments
- Your work will directly enable product teams to move faster while ensuring our AI systems are reliable, observable, secure, and cost-efficient
- Build core AI infrastructure
- Design and implement platforms for LLM orchestration, tool execution, and agent workflows
- Develop shared services and abstractions used across multiple AI applications
- Build AI capability layers (tools / skills)
- Design and implement tools (“skills”) that agents and applications rely on, including APIs, workflows, and integrations
- Define clear interfaces for capabilities such as data retrieval, calculations, document processing, and external system actions
- Build reusable, composable abstractions that enable safe and scalable tool usage across systems
- Ensure tools are reliable, observable, and secure, especially when interacting with sensitive data
- Enable agentic systems at scale
- Build infrastructure to support multi-step agents (state management, tool routing, retries, failure handling)
- Design systems where agents reason over and invoke tools/skills reliably
- Create reusable orchestration patterns between models and capabilities
- Develop evaluation and observability systems
- Build frameworks for offline and online evaluation of AI systems
- Implement logging, tracing, and monitoring for model behavior and system performance
- Own reliability and performance
- Design systems for high availability, fault tolerance, and graceful degradation
- Optimize for latency, throughput, and cost across AI workloads
- Build data and retrieval infrastructure
- Develop scalable RAG pipelines, indexing systems, and data processing workflows
- Own infrastructure for handling large-scale structured and unstructured data
- Create internal platforms and developer tooling
- Build tools, SDKs, and internal platforms that enable engineers to integrate AI capabilities quickly and safely
- Standardize best practices across teams (prompting, evaluation, deployment)
- Work closely with product and AI teams
- Partner with AI Applications engineers to support production use cases
- Translate product needs into scalable infrastructure solutions
Benefits
- Company equity managed through Carta
- Unlimited PTO in an environment where taking time off to relax or recharge is supported and encouraged
- Full medical and dental benefits
- 401k with Match and 100% vesting upon hire
- Fully remote, flexible work environment (we do however meet together in person several times a year)
- Paid parental leave
Apply tot his job Apply To this Job