Summary
Overview
Work History
Education
Skills
Certification
Additional Information
Languages
Timeline
Generic

Mohammed Shafaat

Pune

Summary

With over 8 years of diverse experience spanning from Databases to Site Reliability Engineering, I have honed a versatile skill set. Beginning as a Big Data Ops engineer on premise, transitioned myself into managing clusters over AWS and Azure, facilitating seamless cluster deployments. Subsequently, delved into SRE , spearheading Data Pipeline Development and crafting robust Database Backup Architectures. Proficient in cloud technologies, Orchestrated upgrades, migrations, and POC programs, exemplifying expertise in HDFS,SQL, Tableau, MariaDB, Control-M, Ansible & Shell scripting. With a foundation in compliance and banking domain intricacies, I bring comprehensive expertise to data engineering endeavor.

Overview

8
8
years of professional experience
1
1
Certification

Work History

Big Data Engineer

Deutsche Bank
10.2021 - Current

Data Engineering with an emphasis on SRE, primarily focused on designing and optimizing data architectures to meet the bank's analytical and reporting needs. Analysis of user resource consumption behavior for cluster holding over 28PB data. Led Cloudera Cluster upgrades and designing database backup-migration strategy for SRE.

RESPONSIBILITIES

  • Data Observability and Monitoring: Developed ansible scripts to run time series queries that pull cluster utilization metrics and scan HDFS name node FSimage to get raw cluster stats.
  • Data Pipeline Management(ETL): Leveraged shell scripts to transform unstructured data from Cloudera cluster and HDFS and load it into MariaDB.
  • Data Analysis and Visualization: Utilized Tableau skills to further create blended data sources out of MariaDB table with GCP Big Query dataset, calculated fields, etc. and publish reports and dashboards. This allowed clients to get meaningful cluster insights & understand their resource consumption better.
  • Tableau Reporting based on: Cluster user resource allocation & utilization stats, archival data candidates, hive table, partitions, size stats, data growth trends, etc.
  • Data Compaction Solution: Designed HAR and compaction solution to tackle small files with data archival and compaction plan thus optimizing data utilization by 35%.
  • Architected Prometheus and Grafana Project: Deployed HTTPS/blackbox/HDFS exporters along with central Prometheus time series database and integrated it with Grafana. Was able to leverage Grafana and publish stats for URL response time, service uptime and RDBMS matrices.
  • Postgres to Oracle ExaCC Migration: Led successful migration of PostgreSQL database from legacy Linux servers to Oracle ExaCC, including schema creation and swift data migration.
  • Cloudera Cluster Upgrade: Upgraded prod cluster from CDH5 to CDP7 with over 250+ nodes. For 6 different clusters.
  • Automation Control-M: Migrated all cluster corn jobs to control-m, to cost and leverage central scheduling solution for jobs.

Sr. Associate Consultant Big Data

Infosys
11.2019 - 06.2021

Led a team of four in the deployment of a Microsoft Azure HD Insight-(compute) PaaS platform integrating with ADLS-Gen2(storage) for data analytics and modeling. Managed end-to-end administration of a HDFS-Spark cluster orchestrated with ambari, overseeing administrative and operational activities.

RESPONSIBILITIES

  • Azure PaaS Operations: Managing - Access controls, IAM role assignments, key vault, cluster resource groups, Vnets, service endpoints, ADLSGen2 container management & RBAC controls, Azure ADDS DNS configuration, etc.
  • HDFS Cluster Deployment: End to End ARM Template Cluster Deployment for hybrid IaaS-PaaS model. IaaS edge node integration with PaaS Microsoft HDInsight.
  • Library management: Managed library dependencies using Conda and Pip for package installation. Implemented virtual environments for Conda libraries, optimizing resource utilization for efficient library management across projects.
  • Service Configs with Grafana: Comprehend cluster performance behavior from various Grafana reports, to fine tune service configuration.
  • Security: Ranger Policy management, TLS/SSL setup on IaaS machines, certificate renewal.
  • Data Pipeline: Collaborated with data engineers and analysts to design and implement data pipelines and workflows, leveraging technologies like Apache Spark, Kafka & Sqoop.
  • Automation and Agile: Developed ansible script to automate, jdbc upgrade, TLS/SSL certificate renewal, linux mountpoint/partition creation, hadoop prerequisites task.

Platform Support Engineer | Sybase DBA

Tata Consultancy Services
02.2016 - 10.2019

Oversaw platform operations for Cloudera Hadoop Cluster(CDH) & managing infrastructure for 100+ nodes over AWS Cloud & on-premise, ensuring stability & performance optimization. Integration of third-party tools like Dataguise, Tableau with HDFS cluster.

Platform Support Responsibilities

  • CDH: Handled operations tasks for Cloudera-based big data cluster, of Hadoop ecosystem components such as HDFS, YARN, and Hive, Sentry, sqoop, Oozie,etc.
  • AWS Cloud: Responsible for design/deployment solutions of high availability clusters over AWS IaaS cloud -comprising of EC2, VPC, S3,IAM,Amazon RDS,etc.
  • Upgrades: Planned and executed updates, patching and version upgrades for CDH in collaboration with unix and cloudera.
  • Capacity Planning: Prepared capacity planning and scalability assessments report to accommodate growing data volumes and user requirements for management.
  • Misc tasks: Cluster scaling, balancing HDFS data, failover management. Managed security configurations, including Kerberos authentication and defined Sentry policies (RBAC), to ensure data privacy and compliance.

Sybase DBA Responsibilities

Monitoring & Migrations ASE severs spread across 1000 Linux severs machines hosting classified customer/bank's classified information.

  • Database serves and replication server health monitoring.
  • Database server access and login management, database backups and restores.
  • Replication Servers deployment to replicate data between remote sites in London, Tokyo, and New York.

Education

Bachelor of Engineering - Electronics And Telecommunication

MGM Jawaharlal Nehru Engineering College
Aurangabad, India
06.2015

Skills

  • Data Engineering
  • Site Reliability Engineering (SRE)
  • Data Governance
  • Big Data Analytics
  • Data Visualization
  • Data Analysis
  • ETL
  • HDFS, Spark & Hive
  • SQL and RDBMS
  • Ansible-Shell Scripting
  • DB Backup and Restore
  • Tableau
  • AWS
  • Azure

Certification

ITIL® Foundation Certificate in IT Service Management -2017

AWS Technical Professional (Amazon Partner Network) -2017

Tableau Desktop Specialist -2023

Additional Information

Date of Birth:07/ 27/ 1993
PAN No: CMHPM0943H

Languages

English
Bilingual or Proficient (C2)
Hindi
Bilingual or Proficient (C2)
Marathi
Upper intermediate (B2)
Arabic
Upper intermediate (B2)

Timeline

Big Data Engineer

Deutsche Bank
10.2021 - Current

Sr. Associate Consultant Big Data

Infosys
11.2019 - 06.2021

Platform Support Engineer | Sybase DBA

Tata Consultancy Services
02.2016 - 10.2019

Bachelor of Engineering - Electronics And Telecommunication

MGM Jawaharlal Nehru Engineering College
Mohammed Shafaat