Summary
Overview
Work History
Education
Skills
Timeline
Generic

ASHWIN KADAM

Thane

Summary

4+ years of experience as DATA ENGINEER working on data modeling techniques, knowing data governance, testing pipelines, optimizing existing pipelines, being able to translate business needs into robust data pipelines and support of applications for Pharma, Retail and Fintech domain.

Overview

7
7
years of professional experience

Work History

Manager - Data Quality

TransUnion
04.2023 - Current
  • I was employed by Teleperformance, where I worked on a project for the client TUCIBIL. After
    demonstrating my skills and contributions, TUCIBIL offered me a full-time role, where I got to
    continue my work as Data Engineer.
  • Built ETL pipelines to extract data from S3, transform it using PySpark and Hive, and load it
    into AWS Redshift for downstream reporting and analytics.
  • Leveraged AWS Glue and Shell Scripts to automate workflows, reducing manual intervention
    and increasing data processing efficiency.
  • Ensured data quality with validation frameworks, addressing issues mentioned by RBI
    Circulars.
  • Designed ELT pipelines to ingest raw data into AWS S3 and HDFS, enabling flexible, on demand
    transformations directly on the target system.
  • Utilized Spark and Athena for distributed and in-place transformations, supporting large-scale
    ad-hoc queries and real-time analytics.
  • Optimized data processing workflows to reduce processing time by 20%, ensuring timely
    availability of datasets for analysis teams.
  • Automated repetitive tasks and improved operational efficiency by integrating Shell Scripts and VBA Macros into data workflows.
  • Developed VBA Macros to automate recurring data validation, report generation, and tracking
    tasks.
  • Reduced manual efforts by 50% through customized macros, enabling faster turnaround for business-critical reports.
  • Collaborated with cross-functional teams to define data standards and resolve discrepancies, ensuring compliance with regulatory and business requirements.
  • Increased operational efficiency through automation and workflow optimizations.

Data Analyst

Teleperformance
06.2022 - 04.2023
  • Understand the core process of data reporting by Credit Bureaus by financial institutions.
  • Extracting Data (using Big Data Technologies),analyzing and identifying data quality issues by applying Data Quality rules of completeness,freshness,accuracy,consistency and uniqueness
  • Develop mechanism to analyze daily data flow into repository using PySpark based framework.
  • Share insights, trends and monitor key business metrics around Consumer lending.
  • Applied Big Data tools Spark/Pyspark/SQL/Unix scripting to prepare meaningful stories related to Quality and risk parameters.
  • Built Automation jobs for BAU Activities and developed data pipelines implementing efficient workflows.
  • Identified inefficiencies and automated various, implemented automated solutions resulting
    in significant time and resource savings.
  • Conducted in depth analysis for Ad hoc requests and provided valuable insights.
  • Responsible for Hadoop cluster and shared drive health, managed and optimized resources.
  • Cleaned space and ensured system health.
    Identified Optimization opportunities for existing codes/queries.
  • Created different module for transformation of data based on business requirement.
  • Implemented Data Validation and Quality Checks
  • Created PySpark pipeline for importing data into HBase and hive, data transformation and
    aggregation.

Senior Data Analyst

Eclerx
12.2020 - 06.2022
  • Curated, modeled, and transformed large datasets to align with business requirements, ensuring data accuracy and consistency for analytical and reporting needs.
  • Enhanced existing datasets by integrating new fields through efficient data pipelines from diverse source data, optimizing for both performance and scalability.
  • Conducted data quality assessments to identify and resolve discrepancies and anomalies, improving data reliability for downstream analysis.
  • Utilized Databricks for data processing and management, leveraging PySpark and SQL for high-performance data manipulation in a distributed environment.
  • Leveraged AWS services for scalable data storage, retrieval, and processing solutions to support data engineering workflows in the cloud.

Automation Engineer

Bowman & Archer
12.2017 - 04.2019
  • Building ETL Pipelines using SSIS for Machine generated data in form of CSV and then load whole data in SQL Server.
  • Creating Batch Reports for Pharmaceutical Machineries using MS SQL, VBScript.
  • Creating graphic and Development scripts in SCADA(WinCC flexible, Advance RT) Monitoring & Controlling plant through SCADA.

Education

B.Tech - Instrumentation

Konkan Gyanpeeth College of Engineering
Karjat
06-2017

PGP-DSE - Big Data Analytics, Data Science & Analytics

Great Lakes Institute of Management
Mumbai
06-2020

Skills

    Pyspark

    SQL

    Data Modelling

    Spark

    AWS S3,Glue,Athena,Redshift,EMR

    Hadoop

    Python Pandas

Timeline

Manager - Data Quality

TransUnion
04.2023 - Current

Data Analyst

Teleperformance
06.2022 - 04.2023

Senior Data Analyst

Eclerx
12.2020 - 06.2022

Automation Engineer

Bowman & Archer
12.2017 - 04.2019

B.Tech - Instrumentation

Konkan Gyanpeeth College of Engineering

PGP-DSE - Big Data Analytics, Data Science & Analytics

Great Lakes Institute of Management
ASHWIN KADAM