Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic
JAPHAR ALI SHAIK

JAPHAR ALI SHAIK

ENGINEERING MANAGER
Pune

Summary

Dynamic Engineering Manager with a proven track record at Citi Corp, specializing in big data solutions for AML. A certified Quantexa Data Engineer skilled in PySpark, Scala/Spark, I drive efficiency and enhance data integration, achieving significant improvements in risk detection and team productivity. Passionate about solving complex challenges in financial crime analytics.

Overview

19
19
years of professional experience
1
1
Certification

Work History

Quantexa Engineering Manager

Citi Corp
01.2023 - Current
  • As an AML (Anti Money Laundering) Engineering Manager and Quantexa Certified Data Engineer, I specialize in managing, developing and optimizing big data solutions for Anti-Money Laundering (AML). With hands-on development expertise in Spark and PySpark, along with a strong background in Scala and Hadoop, I focus on building scalable data pipelines and enhancing Entity Resolution capabilities.
  • Currently, I’m working on Entity Resolution in AML, ensuring accurate data integration and risk detection across large datasets. Passionate about leveraging data-driven technologies, I enjoy solving complex challenges and driving efficiency in financial crime analytics.
  • Enhanced overall team productivity with continuous training and mentoring of junior engineers.
  • Improved engineering processes by streamlining workflows and implementing efficient project management techniques.

MIP Predictive Analytics

Citi Corp
04.2018 - 12.2022
  • Citi AML is looking for ways to make AML monitoring more efficient and effective. This project aims to implement new AML monitoring system by replacing legacy system that detects potential suspicious activities. The solution is known as Monitoring intelligence Platform.
  • This project aims to reduce false positives in AML monitoring and achieve time to market. Monitoring Intelligence Platform aims to introduce the next generation of AML monitoring by leveraging behavior analytics that can replace existing rule based monitoring system.
  • Design technical solutions for the predictive analytics solution framework
  • Develop Features and Train Models.
  • Develop and enhance framework that maintains Feature repository, model repository, Typology, Behavior group vs. Features mapping and Type-II error rates.
  • Developed more than 5000+ features in Pyspark. Configured a Model (XGBoost classifier) that uses top features for customer scores, feature importance and feature contribution.
  • Create a process to select best features from Training & Test datasets by utilizing Markov Chain Monte Carlo method with K-fold Cross validation to find best model with 95% recall rate.
  • Generate Confusion matrix and AUC/ROC graphs by using Matplotlib.
  • Implemented concepts like Naives Bayes theorem to find posterior probabilities, z-scores and weights of risky attributes.
  • Developed a framework that covers these areas
  • Data Ingestion
  • Pre-processing – Generate weights for each transaction based on Posterior probabilities, z-scores.
  • Feature computation based on weights and transactional data
  • Model Scoring – XGBoost model
  • Feature contribution – XGBoost model with SHAP
  • Behavior group contribution – Based on NTILE logic
  • Case decision - ATL (Above The Line) & BTL (Below The Line)
  • Case Generation - Alerts generation process

Bank Of America, Spark Developer & Big Data Archit

TCS
07.2016 - 03.2018
  • Currency Transaction Reports(CTR) is a mandatory regulatory reporting requirement which aligns to the Anti- Money Laundering(AML). The current legacy LCR system lacks ability to deliver against increasing regulatory requirements and standards for CTR reporting, enhance cash aggregation will impose highly manual workflow review process. Hence bank decided to rewrite Legacy LCR application which will not only replace LCR but creates the Enterprise Cash View (ECV) for all cash regulatory (CTR, MIS and Others) requirements and also AML use cases.
  • LCR rewrite, is a Large Currency Reporting application where in as per US government, Financial crime enforcement department require all transactional reports of all customers who make transactions worth more than 1000$ per day and also MIS transactions worth more than 3000$ per day. LCR application pulls/gets transactional data from various sources like ATM, Loans, Financial center's, SafeBox, Vaults, etc which are cash transactions. LCR application aggregates the transactions made by individual customers and generates CTR(currency transactions report) and will be sent to FinCEN.
  • Worked closely with clients to gather requirements and translate them into technical specifications for implementation.
  • Continuously updated skills through training courses, workshops, and self-study—staying current on industry trends and emerging technologies.

HSB,GBM Big Data Project,Technical Lead

TCS
07.2013 - 06.2016
  • HSBC is undergoing a major transformation program, which will result bringing 300+ multiple existing projects like Ingestion, Trades, Credit Card fraud detection, Anti money Laundering under Big Data program. The bank targets to achieve a high ROI in first 3 years through the Big Data initiative. As a part of the initiative Global Banking and Markets (GBM) are redefining the way Data and Analytics are used at HSBC GBM by building a Big Data platform.
  • GBM Big Data is designed to manage large datasets – that includes building physical metadata for the ingested sources; profile datasets, map datasets to logical ontologies; apply data quality rules for data accuracy and consistency and maintain overall quality; normalize the data and apply data standards and maintain the data lineage for all the source systems. HSBC is undergoing a major transformation program, which will result bringing 300+ multiple existing projects like Ingestion, Trades, Credit Card fraud detection, Anti money Laundering under Big Data program. The bank targets to achieve a high ROI in first 3 years through the Big Data initiative. As a part of the initiative Global Banking and Markets (GBM) are redefining the way Data and Analytics are used at HSBC GBM by building a Big Data platform.
  • Mentored junior developers through regular 1-on-1 meetings, providing guidance on best practices, coding standards, and career growth opportunities.
  • Coordinated with cross-department teams like QA, DevOps, and Support to ensure seamless end-to-end software delivery process.

Morgan Staley,WISE Migration,Onsite Technical Lead

TCS
08.2008 - 06.2013
  • The Project WISE Migration aims at migrating the application onto Teradata platform form Sybase. This will involve migrating different application layers as well as 13 Tera bytes of data spanned across 2000+ database instances. This is to ensure scalability, maintenance and performance of WISE. WISE ETL infrastructure is core ETL Framework. This framework is used to customize the extract, transform and loads the data.
  • Create and review the conceptual model for the EDW (Enterprise Data Warehouse) with business users.
  • Analyze the source system (Sybase) to understand the source data and table structures along with deeper understanding of business rules and data integration checks.
  • Identified various facts and dimensions from the source system and business requirements to be used for the data warehouse.
  • Create the dimensional logical model with approximately 405 facts, 936 dimensions with 13000+ attributes using Erwin Studio.
  • Implement slow changing dimension schemas for most of the dimensions.
  • Implement the standard naming conventions for the fact and dimension entities and attributes of logical and physical model.
  • Review the logical model with Business users, ETL team, DBA’s and testing team to provide information about the data model and business requirements.
  • Migration of ETL jobs, which are available in kron shell to object oriented Perl.
  • Creating Perl modules for application specific replicators.
  • Code review of the new modules.

Misys Health Care System Migration, Team Lead

TCS
08.2006 - 07.2008

The project involves the analysis, design and migrating practice management system and electronic medical records application being used by medical offices for managing the clinical and financial aspects of their practice. These products are implemented using legacy technologies 4GL, ESQL, C, Java on the Unix platform with database as Informix. As a part of the project, these products where migrated from legacy technologies to Microsoft technologies on the windows platform. The applications existing in 4GL, ESQL, C, Java etc were re-engineered to C#.NET. The database was migrated from Informix to SQL server 2005.

Education

Master of Computer Applications -

Adaikala Matha College
Thanjavur

Skills

Certified Quantexa Data Engineer

PySpark

Spark/Scala

BigData

Shell Scripting

Engineering design

Training and mentoring

Architecture

Problem-solving

Design development

Certification

Quantexa Data Engineer

Timeline

Quantexa Data Engineer

03-2025

Quantexa Engineering Manager

Citi Corp
01.2023 - Current

MIP Predictive Analytics

Citi Corp
04.2018 - 12.2022

Bank Of America, Spark Developer & Big Data Archit

TCS
07.2016 - 03.2018

HSB,GBM Big Data Project,Technical Lead

TCS
07.2013 - 06.2016

Morgan Staley,WISE Migration,Onsite Technical Lead

TCS
08.2008 - 06.2013

Misys Health Care System Migration, Team Lead

TCS
08.2006 - 07.2008

Master of Computer Applications -

Adaikala Matha College
JAPHAR ALI SHAIKENGINEERING MANAGER