Summary
Overview
Work History
Education
Skills
Achievements
Timeline
Generic

Srishti Smita

Pune

Summary

Dynamic Data Engineer with a proven track record at A.P. Moller Maersk, specializing in ETL processes and real-time data streaming using Kafka. Expert in Spark and SQL, enhancing data retrieval efficiency. Strong collaborator, driving insights for People Analytics and sales teams while ensuring data integrity and security. Have experience working with both AWS and Azure cloud services.

Overview

9
9
years of professional experience

Work History

Data Engineer

Maersk Global Service Center
02.2022 - Current
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability
  • Gathered, defined and refined requirements, led project design and oversaw implementation
  • Designed data models for complex analysis needs
  • Developed database architectural strategies at modeling, design, and implementation stages to address business or industry requirements
  • Implemented Kafka for real-time data streaming, which increased data flow efficiency across systems
  • Utilized Spark and SQL to process large datasets improving data retrieval times by over 30%
  • Created utils for being used in other repeated transformations using pyspark in databricks
  • Built Dify.ai workflows to get insights on Credit Approval data
  • Built anonymised data and had frequent audits with Cyber team for data safety
  • Was part of EACHAT (Employee assist chat) builder team where I worked with Qdrant (Vector DB) ,Llama indexing and Spacy
  • Also worked on Synthetic Data use case as POC
  • Also worked on column hashing, bucketing of data and broadcast joins
  • Established and secured enterprise-wide data analytics structures
  • Closely worked with People analytics and sales teams to get insights on Secessions, DEI etc
  • Also in parallel worked with finance teams to get all the different sources on invoices under one Umbrella which worked wonders by saving time and making things easier for sales teams
  • Used optimization like Vacuum Tables which freed up sufficient space and made faster read /write on Tables

Consultant

Capgemini
11.2020 - 02.2022
  • Liaising with customers and Data Science teams to better understand customer needs and recommend appropriate solutions.
  • · Understanding functional specifications, analyzing and understanding requirement
  • · Creating notebooks for new objects with complex transformation logics to implement
  • · Transforming data using broadcasts, skew joins and dataframes with spark SQL on top of Azure Databricks
  • · Creating pipelines using Azure Datafactory with lookups, foreach, copy, notebooks etc. activities
  • · As a mini project involved in analyzing the effect of migration from csv to parquet source files. Developing notebooks and pipelines and changing triggers with soft and hard dependencies
  • · Doing History load for objects with new transformations that usually involve delta load/ incremental load
  • · Creating managed/internal tables, external tables, Stored Procedures, Views and designing cube-XMLA scripts with Microsoft visual studio
  • · Building relationships for working data models (Star Schema implemented as data model)

Program Analyst

Cognizant Technology Services
12.2016 - 11.2020
  • Analyzed unstructured information to derive key insights
  • · Used performance tuning strategies in Hive like Partitioning and Bucketing to improve the performance of the Hive query execution
  • · Data enriched using Spark, pushed into elastic search and hive
  • · Responsible in ingesting data to hive data store as ORC
  • Developed end to end queries using Hive for creating, storing and analyzing the tables
  • · Automated EC2 instances using describeInstance() function and coded in python 3.7
  • · Resolved special character issue in XML file using python

Education

B.E - Electronics And Communications Engineering

Technocrats Institute Of Technology And Science
Bhopal, India
01-2016

Skills

  • Data Engineering Skills: Data integration,SQL and databases,Big data technologies
  • Big Data Skills: Spark with Scala, PySpark, Kafka, SQL- SSMS, Databricks, Datafactory
  • Cloud:ADLS/Blob, ADF, Logic App, EC2,SC3, Azure , AWS, Azure OpenAI
  • AI: Difyai, Tensorflow, Spacy, Pytorch, ML- Neural Networks, Fast API, Vector Database
  • API testing Skills: Postman/Bruno
  • Data security
  • Machine learning
  • API development
  • Risk analysis

Achievements

1) Was awarded Xtra mile in Capgemini

2) Won hackathon at Maersk for Synthetic Data Generation

Timeline

Data Engineer

Maersk Global Service Center
02.2022 - Current

Consultant

Capgemini
11.2020 - 02.2022

Program Analyst

Cognizant Technology Services
12.2016 - 11.2020

B.E - Electronics And Communications Engineering

Technocrats Institute Of Technology And Science
Srishti Smita