Summary
Overview
Work History
Education
Skills
Additional Information
Timeline
Generic

Sumit Lage

Cloud Data Engineer
Pune

Summary

Accomplished Senior Data Management Lead with a proven track record at Telstra. Skilled in migrating applications to Azure, optimizing data processing with PySpark, and implementing secure solutions. Expertise in Spark, Python, Java and Agile methodologies, with a focus on driving cost reductions and improving operational efficiency. Strong problem-solving abilities and a collaborative approach to project execution, ensuring innovation and seamless delivery.

Overview

12
12
years of professional experience
5
5
years of post-secondary education
1
1
Language

Work History

Data Management Senior Lead

Telstra
04.2023 - Current
  • Migrated a streaming application from on-premises to Azure cloud, achieving significant reductions in processing latency, cost savings by decommissioning Cosmos DB, and efficient big data management with Delta Lake integration.
  • Designed a reusable PySpark workflow for batch data ingestion into Cosmos Containers, enhancing reusability, reducing overhead, and ensuring seamless data access.
  • Resolved critical AWS AMP Streaming issues by implementing robust error-handling mechanisms, improving system consistency, and upgrading the Java codebase from JDK 8 to JDK 17 to eliminate vulnerabilities.
  • Developed secure solutions for customer PA data processing, including APIs and batch processing with FastAPI, while ensuring robust data encryption at rest and in transit.
  • Built a near real-time streaming solution for International Roaming notifications using Delta Lake’s Liquid Cluster. Created Silver and Golden layers, and integrated Databricks dashboards for SLA monitoring and alerting.
  • Implemented Row-Level and Column-Level Security (RLS & CLS) for value-based access control, automating security processes with metadata-driven configurations and creating domain-specific data views.
  • Migrated Hive tables to Databricks and aligned existing pipelines with new Liquid Cluster tables. Optimized tables with daily scheduled jobs, significantly reducing execution times.
  • Transitioned Java SpringBoot Lambda code to Python Powertools, and automated the creation of AWS resources (e.g., Kinesis, SQS, IAM, Lambda, SSM Parameters, S3) using Terraform. Streamlined deployment with AWS CLI shell scripts and piloted similar processes using AWS CDK.

Senior Data Engineer

Globant
07.2022 - Current
  • Collaborated with the core team at MetLife to develop a scalable data ingestion pipeline using Azure PySpark, facilitating the migration of data from on-premises SQL Server to Azure SQL Server.
  • Designed and implemented an optimized Azure Synapse PySpark pipeline utilizing ADLS, Synapse, ADF, SQL Server, and Delta Lake. Applied transformation logic to input data and ingested it into Cloud SQL Server, enabling OLAP functionalities.
  • Partnered with business analysts for requirement gathering, troubleshooting issues, and delivering consolidated solutions to ensure seamless pipeline operations.
  • Developed a proof of concept (POC) for ingesting data into a Cloud SQL database table, reducing ingestion time from approximately 3 hours to just 10 minutes, demonstrating significant performance improvement.
  • Gained exposure to cloud-based SCM processes and adhered to best Agile practices, ensuring efficient team collaboration and timely project delivery.

Specialist Data Engineer

L&T INFOTECH
02.2021 - 07.2022
  • Worked on the Nordea Banking project to develop Java-integrated Spark solutions aligned with MiFID II standards, enhancing transparency, accuracy, and investor protection.
  • Developed batch and streaming applications, sourcing data from multiple platforms while maintaining data state throughout the processing lifecycle to enable efficient backtracking.
  • Created Hive scripts for data analysis and used HiveQL extensively for daily operations. Leveraged Sqoop for importing data from relational databases and resolved Sqoop-related issues to ensure smooth workflows.
  • Automated data pipelines and workflows using shell scripting and scheduling engines, reducing manual intervention and significantly improving efficiency.
  • Implemented best coding practices and conducted rigorous testing to identify and minimize bugs, ensuring high-quality code delivery.
  • Refactored existing solutions to improve reusability and scalability, optimizing overall performance and resource utilization.
  • Gained expertise in Agile methodologies, peer reviews, and continuous integration practices. Skilled in debugging and troubleshooting by analyzing end-to-end data flows, ensuring seamless project execution.

Consultant

ADP
08.2015 - 02.2021
  • Contributed to the ADP iHCM 2 project, which simplifies and streamlines HR processes, including payroll, time management, and employee performance management.
  • Collaborated as a core team member to gather business requirements, design architecture, and develop end-to-end business solutions tailored to client needs.
  • Developed data pipelines and models aligned with business requirements to facilitate efficient data processing.
  • Processed large datasets using Apache Spark, leveraging Spark SQL for business operations and data analysis to derive insights.
  • Utilized Kafka-Flume for consuming and processing streaming data, and developed a proof of concept (POC) using Spark Streaming for real-time data processing.
  • Applied expertise in business, statistical, and predictive modeling to identify underlying trends through Exploratory Data Analysis (EDA).
  • Created a Linear Regression model to assess the effectiveness of mobile apps vs. websites in terms of user benefit.
  • Collaborated closely with business analysts, operations teams, and other stakeholders to provide optimal solutions based on in-depth understanding of business requirements.

Associate Engineer

Atos
02.2013 - 08.2015
  • Worked for KPN, a leading telecommunications and IT provider in the Netherlands, as an SQL Server Developer, creating solutions based on business requirements using SQL.
  • Converted over 300 BladeLogic jobs to Ansible and built a CI/CD pipeline with Jenkins, streamlining deployment and integration processes.
  • Developed and deployed microservices using AWX Tower, enhancing automation and efficiency in the deployment pipeline.
  • Managed tokenization through property dictionaries and DPM for generic deployments, ensuring secure and scalable processes.
  • Utilized Git and Artifact for code and package management, organizing builds through tagging to streamline version control and deployment workflows.
  • Created scripts using PowerShell and Shell scripting to automate tasks, and developed reverse proxy configuration scripts for improved network management.
  • Experienced in troubleshooting issues and performing SQL DBA tasks such as backup and restore to maintain system integrity.
  • Developed a POC in MSBI using SSIS and SSRS, demonstrating expertise in data integration and reporting solutions.

Education

Master of Science - Computer Science

RBNB College
Pune University
06.2011 - 08.2013

Bachelor of Science - Computer Science

RBNB College
Pune University
05.2008 - 04.2011

Skills

Spark, Python, Java, SQL

Azure (ADLS, Delta Lake, Cosmos, Databricks, ADF, Synapse, EventHub, SQL Server)

AWS (Kinesis, SQS, S3, Lambda, Layer, SSM, CloudFormation, CDK)

Scripting, Ansible, Terraform, AWS CLI, Jenkins, CICD

Hive, Sqoop, hadoop

Flume-kafka, spark streaming

Agile\Scrum process, Jira, GIT

Additional Information

Apr 2020: Certified Python data scientist

Jun 2020: Spark with python for Bigdata certified

Jun 2016: Microsoft certified SQL developer

Timeline

Data Management Senior Lead

Telstra
04.2023 - Current

Senior Data Engineer

Globant
07.2022 - Current

Specialist Data Engineer

L&T INFOTECH
02.2021 - 07.2022

Consultant

ADP
08.2015 - 02.2021

Associate Engineer

Atos
02.2013 - 08.2015

Master of Science - Computer Science

RBNB College
06.2011 - 08.2013

Bachelor of Science - Computer Science

RBNB College
05.2008 - 04.2011
Sumit LageCloud Data Engineer