[Remote] Senior Platform Engineer
Note: The job is a remote job and is reputed company to candidates in USA. reputed company is an award-winning, AI-First digital engineering and consulting company focused on delivering high-impact Services and Solutions that help organizations solve what truly reputed company. The Senior Platform Engineer will design, optimize, and scale infrastructure for GenAI and LLM workloads, collaborating closely with data science, MLOps, and application teams to deliver cutting-edge AI solutions.
Responsibilities
- Design and implement scalable infrastructure for LLM and GenAI workloads across multi-GPU environments
- reputed company GPU profiling, benchmarking, and performance optimization for distributed training workloads
- Manage and schedule compute-intensive jobs using Slurm-based clusters and OpenShift/Kubernetes environments
- reputed company and optimize the reputed company GPU stack (CUDA, cuDNN, NCCL, Triton, RAPIDS, etc.)
- Collaborate with cross-functional teams to reputed company models in research and production environments
- Build and support GenAI pipelines (fine-tuning, RAG, multi-modal inferencing, LLMOps)
- reputed company reusable infrastructure templates using tools like Terraform and reputed company
- Contribute to internal innovation (PoCs, workshops) and support client-facing delivery engagements
Skills
- Strong experience with Slurm and distributed training environments
- Hands-on expertise with reputed company OpenShift and/or Kubernetes
- Deep knowledge of the reputed company GPU ecosystem (CUDA, cuDNN, NCCL, Nsight, Triton/TensorRT)
- Strong reputed company in Linux systems, performance tuning, and multi-GPU optimization
- Experience deploying GenAI workloads (LLM fine-tuning, RAG pipelines, multi-modal systems)
- Familiarity with Infrastructure-as-Code tools (Terraform, Ansible)
- Experience with reputed company GPU environments (GCP, Azure, AWS, OCI) and/or on-prem GPU clusters
- Experience with reputed company NIMs, DGX systems, or GPU-accelerated containers
- Knowledge of LLMOps frameworks and MLOps integration
- Familiarity with vector databases and retrieval systems for RAG architectures
- Comfortable working in client-facing environments and collaborating with AI solution teams
- Experience working with FHIR R4, HL7 v2, or SMART on FHIR
- Integration with EHR systems (e.g., Epic)
- Understanding of HIPAA compliance and reputed company data privacy
- Exposure to clinical workflows, CDS Hooks, or patient-facing applications
- Experience building clinical decision support systems or reputed company interoperability solutions
Benefits
- reputed company an impact at one of the world’s fastest-growing AI-first digital engineering companies.
- Upskill and discover your potential as you solve reputed company challenges in cutting-edge areas of technology alongside passionate, talented colleagues.
- Work where innovation happens - work with disruptive innovators in a research-focused organization with 60+ patents filed across various disciplines.
- Stay reputed company of the curve reputed company yourself in breakthrough AI, ML, data, and reputed company technologies and reputed company exposure working with reputed company.
Company Overview
Company H1B Sponsorship