Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Pragat Tiwari

Senior Data Engineer
Mumbai

Summary

Experienced Multi-Cloud Data Engineer

Proven expertise in Python, PySpark, Databricks, Azure Data Factory (ADF), SQL, Apache Airflow, Azure Storage, AWS EMR, AWS Glue, S3, Redshift, IAM, and Linux. Adept at self-managing independent projects and excelling in collaborative team environments. Skilled in managing high-pressure situations and client interactions. Demonstrates a strong ability to adapt swiftly to evolving project requirements.

Overview

5
5
years of professional experience
12
12
Certifications

Work History

Senior Data Engineer

Globant
12.2023 - Current

ETL development and architecture optimization.

  • Onboarded diverse data sources, including SQL databases, SFTP servers, APIs, applications, and external vendor data sources.
  • Led the migration of the existing framework from Kettle to a metadata-driven data loader built on AWS Glue, enhancing scalability and independence.
  • Build ETL pipeline for newly arrived Data sources.
  • Designed and implemented an automated monitoring framework to eliminate manual job tracking and streamline report generation.
  • Engineered a complex ETL pipeline using PySpark to process large XML files and efficiently load data into Redshift.
  • Participated in the modeling of newly integrated data sources and the creation of data products.
  • Transformed the data lake on S3 into a Delta Lake, enabling lakehouse capabilities within the existing framework

Module Lead Data Engineer

Impetus Technologies
09.2022 - 12.2023

Databricks Proof of Concept (POC)

  • Implemented Databricks to demonstrate its advantages over AWS Glue.
  • Successfully migrated Glue jobs to Databricks.
  • Authored a comprehensive whitepaper outlining the benefits and drawbacks of Databricks, supporting organizational migration decisions.

Teradata to Redshift migration and query translation

  • Developed a tool for migrating Teradata SQL code and queries to Redshift using Python and regex, creating a metadata-driven utility.
  • Optimized queries during the migration process to ensure performance on Redshift.
  • Utilized Apache Airflow to orchestrate the migration process, automating pipeline triggers for new batches of queries.

Redshift and RDS automation

  • Created an automated solution for extracting metric data and relevant information from RDS and Redshift to support dashboards and reports.
  • Utilized AWS Glue and Boto3 extensively in the solution.
  • Automated deployment and upgrades using AWS CloudFormation.
  • Implemented Delta Lake on AWS Glue as part of a proof of concept (POC).
  • Developed a Python-based query scheduler to record results for reports and dashboards.









Data Engineer

Tata Consultancy Services
07.2019 - 09.2022

Informatica & Velocity to Azure Databricks Migration

  • Extracted data models from Informatica XML to generate data models for source files.
  • Developed a Python utility to dynamically generate DDL scripts for Spark external tables and views based on data model configuration.
  • Created a metadata-driven data loader to populate tables regardless of the source type.
  • Designed a comprehensive metadata-driven solution for creating and loading external tables in Databricks Delta.
  • Built parameterized Azure Data Factory (ADF) pipelines using multiple activities.
  • Developed a script to dynamically load metadata from a central metadata sheet into Delta tables, supporting further development.
  • Wrote SQL code for data transformations and business rules to achieve the desired results from source data.
  • Constructed and orchestrated parameterized Databricks jobs with defined linear dependencies between tasks.
  • Developed and managed data pipelines using Apache Airflow, orchestrating Databricks jobs with Airflow’s Databricks operators.
  • Created a Python script to build multiple DAGs with a single utility in a metadata-driven approach.
  • Installed and configured Airflow using Azure DevOps, working with various Airflow operators such as Bash, Python, Databricks, and SSH.

Data Shield

  • Developed a tool for customized SQL-based data quality checks.
  • Optimized the solution for quick issue identification and resolution.
  • Designed the quality check interface in Excel for ease of use.
  • Created unit tests to ensure code reliability and accuracy.

SAS to Databricks

  • Translated SAS procedures into Databricks using Python and PySpark.
  • Analyzed SAS code requirements and functionality to design and implement optimal solutions in Databricks.
  • Conducted code optimization for Python and PySpark implementations to enhance performance.

Migration to Cloud

  • Managed the replatforming of applications from on-premises environments to the cloud.
  • Oversaw end-to-end migration of applications, software, and tools.
  • Ensured infrastructure readiness and synchronization for hosting migrated applications.
  • Maintained cloud standardization during application and database migration
  • Migrated both Linux and Windows-based applications.
  • Automated deployment processes using Python scripts.

Education

Bachelor of Computer Applications - Computer Science

Tilak Maharashtra Vidyapeeth
Pune
04.2001 -

Skills

Databricks

Azure DataFactory

Apache Spark

Python

SQL

Azure DevOps

PySpark

ADLS Gen2

Azure Synapse

Apache Airflow

AWS Redshift

AWS Glue

AWS EMR

AWS CloudFormation

AWS RDS

Data Modelling

Delta Lake

Machine Learning

Certification

Databricks Certified: Professional

Timeline

Senior Data Engineer

Globant
12.2023 - Current

Databricks Accredited: Generative AI Fundamentals

07-2023

Databricks Certified: Professional

02-2023

Microsoft Certified: Azure Enterprise Data Analyst Associate

02-2023

AWS Partner: Accreditation (Technical)

02-2023

Microsoft Certified: Azure Data Scientist Associate

01-2023

Microsoft Certified: Azure AI Fundamentals

12-2022

Module Lead Data Engineer

Impetus Technologies
09.2022 - 12.2023

Databricks Certified: Associate Developer for Apache Spar

08-2022

Microsoft Certified: Azure Data Engineer Associate

08-2022

Microsoft Certified: Azure Fundamentals

08-2022

Hackerrank Certified: SQL (Advance)

08-2022

Databricks Certified: Data Engineer Associate

07-2022

Microsoft Certified: Azure Data Fundamentals

06-2022

Data Engineer

Tata Consultancy Services
07.2019 - 09.2022

Bachelor of Computer Applications - Computer Science

Tilak Maharashtra Vidyapeeth
04.2001 -
Pragat TiwariSenior Data Engineer