Data Engineering Intern(Spring/Summer 2026)
Description:
- Support the development and maintenance of data pipelines using Databricks, Spark, and similar technologies.
- Write and optimize SQL and Python scripts for data transformation, integration, and automation tasks.
- Develop automation scripts that populate metadata and comments across Databricks tables using structured definitions such as CSV files.
- Assist in building a proof-of-concept for an automated data dictionary maintained with existing Databricks metadata.
- Contribute to prototyping an AI-powered knowledge agent that uses internal data and documentation to answer common questions.
- Collaborate with team members to improve data quality, cataloging, and metadata management across the ecosystem.
- Participate in code reviews, design discussions, and sprint ceremonies to learn engineering best practices.
- Document findings, workflows, and automation processes for future reuse.
- Perform other duties as assigned.
Requirements:
- Actively pursuing a Bachelor’s or Master’s degree in Computer Science, Software Engineering, Information Systems, or a related technical field.
- Foundational knowledge of Python and SQL for data manipulation and analysis.
- Familiarity with ETL concepts and structured data formats such as CSV, JSON, and Parquet.
- Interest in cloud-based data platforms, with Azure preferred.
- Strong analytical and problem-solving skills with an eagerness to learn.
- Effective communication and teamwork skills.
- Exposure to Databricks, Apache Spark, or other distributed data frameworks is preferred.
- Familiarity with Git or version control practices is preferred.
- Interest in AI/LLM-based automation, data documentation, or metadata management is preferred.
- Prior project or internship experience in data engineering or cloud technologies is preferred.
Apply tot his job Apply To this Job