[Remote] Senior AI Engineer with Databricks
Note: The job is a remote job and is open to candidates in USA. EPAM Systems is looking for a Senior AI Engineer with expertise in Databricks to design, deploy, and maintain scalable machine learning pipelines. The role involves delivering production-ready ML pipelines, monitoring dashboards, and CI/CD pipelines for ML systems.
Responsibilities
- Design, implement and maintain end-to-end ML pipelines on Databricks
- Build workflows for data ingestion, preprocessing, feature engineering, training and inference
- Leverage PySpark, Spark ML and Databricks notebooks/jobs
- Manage model versioning, experiment tracking and reproducibility using MLflow
- Package and deploy models for batch and real-time inference
- Monitor model performance, drift and retraining cycles
- Develop scalable ETL/ELT pipelines using Databricks Delta Lake
- Optimize data storage and access patterns through partitioning, Z-ordering and caching
- Integrate with data sources such as Azure Data Lake, S3, APIs and databases
- Implement CI/CD pipelines for ML workflows using Azure DevOps, GitHub Actions and Databricks Repos and Jobs API
- Configure clusters, autoscaling and cost optimization while applying Infrastructure as Code with Terraform, ARM and Bicep
- Implement logging, alerting and observability to ensure high availability and fault tolerance of ML systems
Skills
- 3+ years of experience in machine learning engineering or related roles
- Expertise in the Databricks platform including workspaces, jobs and clusters
- Proficiency in Apache Spark, PySpark and Python with pandas and scikit-learn
- Skills in MLflow for tracking, registry and deployment
- Competency in CI/CD pipelines, Docker containerization and REST APIs for model serving
- Familiarity with version control using Git
- Background in Azure including Azure Databricks, ADLS, ACR and AML
- Knowledge of data preprocessing, feature engineering and model training and evaluation
- Understanding of libraries such as XGBoost, LightGBM and CatBoost
- English proficiency at B2 level or higher
- Familiarity with AWS including S3, EMR and SageMaker
- Skills in streaming pipelines with Spark Structured Streaming and Databricks Feature Store
- Knowledge of Kubernetes
- Competency in monitoring tools such as Prometheus and Grafana
- Experience with large-scale production systems
Benefits
- Diversity of assignments and projects
- Personal development plan
- Mentoring programs and leadership development
- Certification and professional development support
- Access to learning platforms including more than 2,500 internal courses and the LinkedIn Learning library with 20,000+ courses
- English courses taught by certified teachers
- Extra leave days
- Referral bonuses
- Private health insurance
- Recreation office zones with tea, coffee and snacks
- Sports and game consoles
- IT equipment and Microsoft's Software Assurance Home Use Program (HUP)
Company Overview
Company H1B Sponsorship