Summary

Overview

Work History

Education

Skills

Achievements

Timeline

Srishti Smita

Pune

Summary

Dynamic Data Engineer with a proven track record at A.P. Moller Maersk, specializing in ETL processes and real-time data streaming using Kafka. Expert in Spark and SQL, enhancing data retrieval efficiency. Strong collaborator, driving insights for People Analytics and sales teams while ensuring data integrity and security. Have experience working with both AWS and Azure cloud services.

Overview

years of professional experience

Work History

Data Engineer

Maersk Global Service Center

02.2022 - Current

Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability
Gathered, defined and refined requirements, led project design and oversaw implementation
Designed data models for complex analysis needs
Developed database architectural strategies at modeling, design, and implementation stages to address business or industry requirements
Implemented Kafka for real-time data streaming, which increased data flow efficiency across systems
Utilized Spark and SQL to process large datasets improving data retrieval times by over 30%
Created utils for being used in other repeated transformations using pyspark in databricks
Built Dify.ai workflows to get insights on Credit Approval data
Built anonymised data and had frequent audits with Cyber team for data safety
Was part of EACHAT (Employee assist chat) builder team where I worked with Qdrant (Vector DB) ,Llama indexing and Spacy
Also worked on Synthetic Data use case as POC
Also worked on column hashing, bucketing of data and broadcast joins
Established and secured enterprise-wide data analytics structures
Closely worked with People analytics and sales teams to get insights on Secessions, DEI etc
Also in parallel worked with finance teams to get all the different sources on invoices under one Umbrella which worked wonders by saving time and making things easier for sales teams
Used optimization like Vacuum Tables which freed up sufficient space and made faster read /write on Tables

Consultant

Capgemini

11.2020 - 02.2022

Liaising with customers and Data Science teams to better understand customer needs and recommend appropriate solutions.
· Understanding functional specifications, analyzing and understanding requirement
· Creating notebooks for new objects with complex transformation logics to implement
· Transforming data using broadcasts, skew joins and dataframes with spark SQL on top of Azure Databricks
· Creating pipelines using Azure Datafactory with lookups, foreach, copy, notebooks etc. activities
· As a mini project involved in analyzing the effect of migration from csv to parquet source files. Developing notebooks and pipelines and changing triggers with soft and hard dependencies
· Doing History load for objects with new transformations that usually involve delta load/ incremental load
· Creating managed/internal tables, external tables, Stored Procedures, Views and designing cube-XMLA scripts with Microsoft visual studio
· Building relationships for working data models (Star Schema implemented as data model)

Program Analyst

Cognizant Technology Services

12.2016 - 11.2020

Analyzed unstructured information to derive key insights
· Used performance tuning strategies in Hive like Partitioning and Bucketing to improve the performance of the Hive query execution
· Data enriched using Spark, pushed into elastic search and hive
· Responsible in ingesting data to hive data store as ORC
Developed end to end queries using Hive for creating, storing and analyzing the tables
· Automated EC2 instances using describeInstance() function and coded in python 3.7
· Resolved special character issue in XML file using python

Education

B.E - Electronics And Communications Engineering

Technocrats Institute Of Technology And Science

Bhopal, India

01-2016

Skills

Data Engineering Skills: Data integration,SQL and databases,Big data technologies
Big Data Skills: Spark with Scala, PySpark, Kafka, SQL- SSMS, Databricks, Datafactory
Cloud:ADLS/Blob, ADF, Logic App, EC2,SC3, Azure , AWS, Azure OpenAI
AI: Difyai, Tensorflow, Spacy, Pytorch, ML- Neural Networks, Fast API, Vector Database
API testing Skills: Postman/Bruno

Data security
Machine learning
API development
Risk analysis

Achievements

1) Was awarded Xtra mile in Capgemini

2) Won hackathon at Maersk for Synthetic Data Generation

Timeline

Data Engineer

Maersk Global Service Center

02.2022 - Current

Consultant

Capgemini

11.2020 - 02.2022

Program Analyst

Cognizant Technology Services

12.2016 - 11.2020

B.E - Electronics And Communications Engineering

Technocrats Institute Of Technology And Science

Srishti Smita

Summary

Overview

Work History

Data Engineer

Consultant

Program Analyst

Education

B.E - Electronics And Communications Engineering

Skills

Achievements

Timeline

Data Engineer

Consultant

Program Analyst

B.E - Electronics And Communications Engineering

Similar Profiles

ADRIANA SOLISADRIANA SOLIS

Juliet YapJuliet Yap

CHAITRA RCHAITRA R

SOPHIA LIANSOPHIA LIAN

Nikhila ReddyNikhila Reddy