[Remote] Senior Data Engineers
Note: The job is a remote job and is open to candidates in USA. CCC Intelligent Solutions Inc. is a leading cloud platform for the insurance economy, empowering over 35,000 businesses with innovative technology. They are seeking a Senior Data Engineer to develop large scale end-to-end data pipeline applications, automate processes, and mentor junior engineers while utilizing advanced technologies in data processing and cloud environments.
Responsibilities
- Develop large scale end to end data pipeline applications, covering multiple data sources spread across data center and AWS cloud
- Use developed software applications to locate and analyze source data; create data flows to extract, profile, and store ingested data; define and build data cleansing and imputation; map to a common data model; transform to satisfy business rules and statistical computations; and validate data content
- Produce software data building blocks, data models, and data flows, such as dimensional data, data feeds, dashboard reporting, and data science research and exploration
- Produce automated software tests of data flow components and for data content quality
- Automate orchestration and error handling for use by production operation teams
- Provide technical expertise to diagnose errors from production support teams
- Guide junior team members in performance tuning applications in distributed computing environments
- Perform root cause analysis on all data and processes and identify opportunities for improvement
- Develop metadata-driven and fully parameterized data processing tools
- Mentor junior engineers
Skills
- Master's degree in Computer Science, Computer Engineering, Management Information Systems or related field plus 2 years of experience in software development/data processing or analysis required
- Hands-on experience with: Programming using Python & PySpark; Hadoop; HDFS, map-reduce, YARN, AWS EMR, Redshift, Terraform; Hive, HBase, parquet, ORC, Spark SQL, Sqoop, Apache Hudi; Orchestrating ETL pipelines involving data sourcing, transformations & publishing using Apache Airflow; Performance tuning applications in distributed computing environments; Designing & developing data pipeline applications with Apache Kafka; Advanced SQL for data profiling & data validation; Unix commands & scripting; performing root cause analysis on internal & external data & processes to identify opportunities for improvement; JIRA, Gitlab, Subversion; Development of metadata-driven & fully parameterized data processing tools; AWS
Benefits
- 401K Match
- Paid time off
- Annual Incentive Plan Performance Bonus
- Comprehensive health insurance
- Adoption Assistance
- Tuition Reimbursement
- Wellness Programs
- Stock Purchase Plan options
- Employee Resource Groups
Company Overview
Company H1B Sponsorship