Summary
Overview
Work History
Education
Skills
Personal Information
Core Skills
Languages
Timeline
Generic

Pravin Jawade

Aurangabad

Summary

Data Engineer with 6+ years of experience in building scalable data pipelines, Big Data frameworks, and cloud-based data solutions. Skilled in PySpark, Databricks, SQL, Hadoop ecosystem, and data governance. Proven expertise in ETL design, migrations, data quality, compliance (GDPR, CCPA, HIPAA), and dashboard creation for analytics monitoring.

Overview

7
7
years of professional experience

Work History

Senior Software Engineer

Invent Health Inc.
08.2022 - Current
  • Migrated analytics workflows from R to PySpark, developing object-oriented modules with logging to both text files and log tables.
  • Developed a MySQL to Python migration pipeline for analytics, loading data into RDS, running analytics in PyCharm, and storing results back into RDS.
  • Built a robust PySpark-to-Databricks migration pipeline with error handling, data validation, logging, and email notifications for failed jobs.
  • Designed and implemented ETL pipelines in PySpark for automated ingestion and triggered S3 uploads, ensuring smooth data flow.
  • Maintained and optimized Delta tables with PySpark and SQL, improving data lookup performance.
  • Created Databricks dashboards for real-time analytics status updates, enhancing monitoring and team visibility.
  • Worked with Unity Catalog and lineage tracking tools for data governance, and managed Hive Metastore for schema and metadata handling.
  • Implemented data security & compliance policies (GDPR, CCPA, HIPAA) to protect sensitive data.
  • Designed pipelines for incremental loading and implemented CDC (Change Data Capture) logic for batch and real-time updates.
  • Applied data cleaning and quality checks during ingestion, ensuring reliable datasets for downstream analytics.

Hadoop Developer

Codevizor Technologies Pvt. Ltd.
01.2020 - 08.2022
  • Built and optimized data ingestion pipelines using Hive, Pig, and Sqoop.
  • Developed MapReduce jobs for large-scale batch data processing.
  • Processed structured and semi-structured data (JSON, XML) for analytics.
  • Scheduled and monitored workflows using Oozie.

Data Engineer

EC-Mobility Pvt. Ltd.
10.2018 - 01.2020
  • Migrated mainframe system data into Hadoop-based data lake.
  • Created Hive tables with optimized partitioning and bucketing strategies.
  • Implemented Sqoop jobs for ingestion from MySQL into Hive.
  • Supported requirement gathering and end-to-end pipeline design.

Education

Bachelor of Engineering - Electronics & Telecommunication

Deogiri Institute of Engineering & Management Science
Aurangabad, MH

Skills

  • PySpark development
  • Data pipeline design
  • SQL optimization
  • Big data processing
  • Data governance practices
  • Analytics workflow management
  • Problem solving
  • ETL development
  • SQL expertise
  • Data analysis
  • Big data technologies
  • Analytical skills
  • SQL expertise
  • Data analysis
  • Big data technologies
  • Analytical skills

Personal Information

  • Date of Birth: 10/12/94
  • Nationality: Indian

Core Skills

Python, PySpark, SQL, R (basic), Hadoop (HDFS, YARN, MapReduce), Hive, Pig, Sqoop, Oozie, Databricks, Delta Lake, Unity Catalog, AWS CodeCommit, S3, RDS, MySQL, GDPR, CCPA, HIPAA, Metadata & Lineage Tracking, Incremental Loads, CDC, Data Cleaning, Data Validation, Linux (CentOS, Ubuntu), Windows, PyCharm, VS Code

Languages

Hindi, English
First Language

Timeline

Senior Software Engineer

Invent Health Inc.
08.2022 - Current

Hadoop Developer

Codevizor Technologies Pvt. Ltd.
01.2020 - 08.2022

Data Engineer

EC-Mobility Pvt. Ltd.
10.2018 - 01.2020

Bachelor of Engineering - Electronics & Telecommunication

Deogiri Institute of Engineering & Management Science
Pravin Jawade