Summary
Overview
Work History
Education
Skills
Websites
Learnings And Trainings
Languages
Projects
Timeline
Generic

Priyanshu Seth

Mumbai

Summary

Results-driven data science professional with experience at Mahindra.AI, where I optimized ETL pipelines and enhanced model precision by 15%. Proficient in SQL and data visualization tools like Apache Superset, I excel in transforming complex datasets into actionable insights, while fostering collaboration across teams.

Overview

1
1
year of professional experience

Work History

Data Science Intern

Mahindra.AI
Mumbai
12.2024 - Current
  • Built and optimized 5+ ETL pipelines, and performed EDA on datasets with 10M+ records to uncover insights, improve data quality, ensuring data cleanliness, feature engineering, preprocessing, and consistency for downstream processes to support machine learning workflows across multiple projects.
  • Evaluated and optimized Random Forest hyperparameters across 100+ model iterations, resulting in a 15% increase in precision, recall, and F1 score, significantly enhancing referral likelihood prediction accuracy.
  • Wrote and validated 100+ complex SQL queries to support data integrity and analytical workflows—contributed to the Consumption Story ETL pipeline, enabling 3-level customer segmentation for an LLM-based system, and enhanced Power Insights by retrieving and validating Next Best Action (NBA) IDs.
  • Built 5 interactive dashboards in Apache Superset for the FD Horizon project (visualizing tractor and weather data across 100K+ data points daily) and for the Diversity & Inclusion Dashboard, empowering HR to monitor 10+ key D&I metrics with a focus on data accuracy and completeness.

BI Analyst Intern

Art-Yarn Exports India
Mumbai
10.2024 - 11.2024
  • Designed and developed Power BI reports to analyze and visualize key business metrics, including sales, purchases, and agent commissions, using complex data from multi-sheet Excel files.
  • Enhanced decision-making by creating interactive dashboards and ensuring data accuracy through Power Query transformations of 100K+ records.

Education

PG-Diploma - Big Data Analytics

Centre for Development of Advanced Computing
Mumbai, MH
08.2024

Bachelor of Technology - Computer Science

Jaypee University of Engineering & Technology
Guna, M.P.
05.2023

Skills

  • Machine Learning
  • Amazon Web Services
  • Big Data
  • ETL
  • Google Cloud Platform
  • Hadoop
  • Kafka
  • Python
  • SQL
  • Statistical analysis
  • Data visualization (Power BI, Apache Superset)

Learnings And Trainings

  • GCP Cloud Architecture, https://www.credly.com/badges/bb37dad7-b6e2-481e-a47501f317f162eb/public_url
  • Implement CI/CD Pipeline on GCP, https://www.credly.com/badges/f92984e3-5594-4bc1-9f71a231dcfda2ec/public_url

Languages

Hindi
First Language
English
Proficient (C2)
C2
Spanish
Elementary (A2)
A2

Projects

Real Time Accident Severity Prediction (3 weeks)

  • Tech Stack: AWS (S3, EC2, IAM Roles, VPC, Sagemaker, Cloudwatch, EMR), Python, Spark, Machine Learning, Big Data Hadoop, Apache Kafka.
  • Project involves designing a real-time machine learning system that predicts the severity of road accidents using historical data. The system is primarily aimed at enabling emergency responders to allocate resources effectively based on the predicted severity, ultimately aiming to reduce mortality rates in severe accidents.
  • The entire system forms a robust ETL (Extract, Transform, Load) pipeline. Data is extracted from the Kafka topic, transformed into a format suitable for the ML model, predictions are made, and the results are loaded back into the S3 bucket.

Vehicle Price Prediction, (1 week)

  • Tech Stack: Machine Learning, Python, Numpy, Pandas,Scikit-learn Jupyter Notebook
  • Developed a machine learning model to predict vehicle prices using a dataset of 250K+ rows and 5 key features, extracted through data cleaning and feature engineering from 24 original columns. Implemented and evaluated multiple models, including multiple linear regression, decision tree, and random forest regressors. Achieved best accuracy of 82% with multiple linear regression, outperforming other models (random forest: 78%, decision tree: 76%).

Timeline

Data Science Intern

Mahindra.AI
12.2024 - Current

BI Analyst Intern

Art-Yarn Exports India
10.2024 - 11.2024

PG-Diploma - Big Data Analytics

Centre for Development of Advanced Computing

Bachelor of Technology - Computer Science

Jaypee University of Engineering & Technology
Priyanshu Seth