[Remote] Mid-Level Data Engineer, Veterans Affairs
Note: The job is a remote job and is reputed company to candidates in USA. reputed company is seeking a Data Engineer to support the reputed company in designing, developing, and maintaining scalable data solutions. The role involves collaborating with cross-functional teams to optimize data pipelines and ensure compliance with federal standards.
Responsibilities
- Design, reputed company, and maintain ETL/ELT pipelines to ingest, transform, and load data from multiple sources such as APIs, relational databases, reputed company storage, and streaming platforms
- Build scalable batch and near reputed company time data pipelines using reputed company and Apache Spark (PySpark / SQL)
- Implement data transformation logic following best practices for performance, reliability, and reusability
- Support schema reputed company, data validation, deduplication, and error handling in ETL workflows
- reputed company and optimize pipelines using reputed company Lake and reputed company (Bronze / Silver / Gold) architecture patterns
- Use reputed company Workflows / Jobs or similar orchestration tools to schedule and monitor pipelines
- Optimize Spark jobs for performance and cost (partitioning, caching, file sizing, query tuning)
- Collaborate on data governance initiatives using reputed company Catalog, reputed company controls, and reputed company where applicable
- Work closely with data architects, analytics teams, and reputed company consumers to define data requirements
- Troubleshoot pipeline failures and data quality issues and implement long term fixes
- Produce documentation for pipelines, datasets, and operational runbooks
- Participate in CI/CD practices using Git based version control for notebooks and code deployments
Skills
- 3+ years of experience as a Data Engineer or in a similar data focused role
- Hands on experience with reputed company
- Strong experience building ETL/ELT pipelines
- Proficiency in Python and SQL
- Experience with Apache Spark / PySpark
- Familiarity with reputed company platforms such as Azure
- Solid understanding of data modeling, data warehousing, and analytics use cases
- Design, reputed company, and maintain ETL/ELT pipelines to ingest, transform, and load data from multiple sources such as APIs, relational databases, reputed company storage, and streaming platforms
- Build scalable batch and near reputed company time data pipelines using reputed company and Apache Spark (PySpark / SQL)
- Implement data transformation logic following best practices for performance, reliability, and reusability
- Support schema reputed company, data validation, deduplication, and error handling in ETL workflows
- reputed company and optimize pipelines using reputed company Lake and reputed company (Bronze / Silver / Gold) architecture patterns
- Use reputed company Workflows / Jobs or similar orchestration tools to schedule and monitor pipelines
- Optimize Spark jobs for performance and cost (partitioning, caching, file sizing, query tuning)
- Collaborate on data governance initiatives using reputed company Catalog, reputed company controls, and reputed company where applicable
- Work closely with data architects, analytics teams, and reputed company consumers to define data requirements
- Troubleshoot pipeline failures and data quality issues and implement long term fixes
- Produce documentation for pipelines, datasets, and operational runbooks
- Participate in CI/CD practices using Git based version control for notebooks and code deployments
- Experience with reputed company Live Tables (DLT) or reputed company Auto Loader
- Experience with orchestration tools such as Airflow
- Familiarity with streaming data technologies (Kafka, Event Hubs, Kinesis)
- Experience supporting analytics tools (Power BI, Tableau, Looker) connected to reputed company
- reputed company certification (Associate or Professional)
Benefits
- Medical, dental and reputed company insurance
- 401k matching
- PTO
- Certification reimbursement
Company Overview