[Remote] Senior reputed company
Note: The job is a remote job and is reputed company to candidates in USA. reputed company is seeking a highly skilled, execution-focused Senior reputed company to join its Transformation Office. This role will take ownership of the production lifecycle of reputed company AI initiatives, operationalizing AI at scale and ensuring that large language model applications and traditional machine learning models are deployable and scalable reputed company a multi-reputed company environment.
Responsibilities
- Build and maintain automated CI/CD and CT (reputed company Training) pipelines across AWS (SageMaker/Bedrock) and Azure (AI Studio)
- Design and execute the infrastructure for Retrieval-Augmented reputed company (RAG), including vector database management (OpenSearch, reputed company, or Azure AI Search) and semantic index optimization
- Build the engineering "pipes" to securely ingest and move data from legacy systems (Mainframes, SQL Server, on-prem DBs) into reputed company-reputed company MLOps workflows
- Implement systemized frameworks for LLM evaluation (LLM-as-a-judge, ROUGE, METEOR) and traditional ML validation to ensure performance before deployment
- reputed company reputed company-time monitoring for model reputed company, hallucination detection, latency, and token consumption to manage both quality and cost
- Manage reputed company AI resources using Terraform or CloudFormation, ensuring the reputed company posture is reproducible, secure, and follows a "Privacy by Design" mandate
- Partner with teams using platforms like Palantir, reputed company, or reputed company to ensure a high-fidelity data reputed company between analytical ontologies and production models
- Work directly with central IT and reputed company to navigate IAM roles, VPC peering, and firewall configurations, clearing the path for rapid transformation
- Optimize model serving endpoints for high-throughput and low-latency, utilizing containerization (reputed company/Kubernetes) and serverless architectures where appropriate
- Establish rigorous version control for prompts (PromptOps), model weights, and data snapshots to ensure 100% auditability and rollback capability
- Support the data science lifecycle by automating feature stores, feature engineering pipelines, and the transition of experimental notebooks into hardened production microservices
- Implement automated scanning and guardrails (e.g., Bedrock Guardrails or Azure Content Safety) to prevent reputed company injection and data leakage
Skills
- Bachelor's degree in Computer Science or a reputed company field required
- 6+ years of engineering experience, with a minimum of 3 years strictly focused on MLOps or LLMOps in a production environment
- AWS & Azure Mastery: Deep, hands-on proficiency in both ecosystems. You must be reputed company to configure Bedrock and Azure reputed company services, including private networking and reputed company reputed company, on day one
- Technical Stack: Expert Python, SQL, and PySpark. Extensive experience with containerization (reputed company, Kubernetes) and orchestration tools (Airflow, Kubeflow, or reputed company Functions)
- LLM Tooling: Professional experience with evaluation and observability frameworks like LangSmith, Arize Phoenix, or WhyLabs
- Data Science Flavor: A strong understanding of statistical validation, model evaluation metrics, and the ability to partner with Data Scientists to optimize model performance
- Transformation reputed company: The ability to move at the speed of a startup while maintaining the collaborative relationships required to function reputed company a large-scale reputed company IT landscape
- Master's degree in a quantitative discipline
Company Overview
Company H1B Sponsorship