Shubham Wagh

DATA ENGINEER

Pune

Summary

Results-driven Data Engineer with over 4.2 years of experience in designing and implementing scalable ETL/ELT pipelines using Azure Databricks, PySpark, Spark SQL, Delta Lake, and ADLS Gen2. Proficient in data transformation, migration, and performance optimization across Azure and AWS platforms, ensuring seamless data flow and accessibility. Expertise in leveraging AWS Glue, AWS DMS, Amazon S3, and Amazon Redshift to deliver high-quality analytics-ready data solutions that empower informed decision-making. Committed to fostering continuous improvement and innovation in data engineering practices to align with organizational goals and drive impactful results.

Overview

years of professional experience

Work History

Azure Data Engineer

Unobridge Solution Pvt. Ltd.

03.2023 - Current

Built end-to-end ETL/ELT data pipelines using Azure Databricks, PySpark, and Spark SQL for data ingestion, transformation, and loading into Delta Lake.
Implemented data transformation logic including cleansing, enrichment, and aggregation using PySpark and Spark SQL.
Designed and implemented Medallion Architecture (Bronze, Silver, Gold) to organize data into scalable and structured layers.
Created and managed Delta Lake tables for reliable and efficient data processing.
Optimized Spark jobs using partitioning, caching, and query tuning to improve performance and reduce execution time.
Automated workflow scheduling and monitoring using Databricks Workflows.
Implemented CI/CD pipelines using Azure DevOps and Git for version control and deployment of Databricks artifacts.
Collaborated with business stakeholders, analysts, and reporting teams to deliver accurate and business-ready datasets.

AWS Data Engineer

Unobridge Solution Pvt. Ltd.

03.2022 - 02.2023

Developed and maintained ETL pipelines for seamless loading of retail data into Amazon Redshift from various sources, including databases, APIs, and flat files. Elevated data quality and improved performance to support effective business analysis.
Transitioned on-premise MySQL databases to Amazon S3 utilizing AWS DMS for cloud-based data storage and processing.
Developed AWS Glue PySpark scripts to validate, clean, and transform data for analytics and reporting.
Designed, developed, and monitored AWS Glue ETL jobs to efficiently load processed data into Amazon Redshift.
Worked with Parquet file format and applied compression techniques to optimize storage and improve query performance.
Created Redshift tables by generating DDL scripts from existing MySQL database schemas.
Performed unit testing to ensure data accuracy, reliability, and performance of ETL pipelines.
Maintained detailed technical documentation for ETL workflows, deployment processes, and job scheduling.
Environment: AWS DMS, S3, AWS Glue, Redshift, PySpark, MySQL, Python, Apache Airflow (via AWS MWAA).

Education

Bachelor of Technology (B-Tech) - Mechanical Engineering

CSMSS College of Engineering

Chh. Sambhaji Nagar, Maharashtra

08-2022

Skills

Cloud Platforms: Azure Databricks, Azure Data Factory (ADF), ADLS Gen2, AWS Glue, AWS DMS, Amazon S3, Amazon Redshift, Amazon Athena

Data Engineering: ETL/ELT, Data Migration, Data Transformation, Data Warehousing, Batch Processing, Data Quality Validation

Big Data Technologies: Apache Spark, PySpark, Spark SQL, Delta Lake, Apache Hive

Programming Languages: Python, SQL, Shell Scripting

Databases: MySQL, PostgreSQL, SQL Server

Timeline

Azure Data Engineer

Unobridge Solution Pvt. Ltd.

03.2023 - Current

AWS Data Engineer

Unobridge Solution Pvt. Ltd.

03.2022 - 02.2023

Bachelor of Technology (B-Tech) - Mechanical Engineering

CSMSS College of Engineering