Summary

Overview

Work History

Education

Skills

Certification

Timeline

Manoj Khilari

Pune

Summary

Data Science and AI professional with 3.1 years of core experience in predictive modeling, machine learning model development, and Generative AI. Skilled in Python, Data Science, Machine Learning, Gen-AI, SQL, R, and Excel, with proficiency in BI tools such as Looker and working knowledge of Snowflake and Splunk. AWS Certified Cloud Practitioner with proven ability to build scalable, data-driven solutions and deliver actionable business insights.

Overview

years of professional experience

Certification

Work History

Data Science Consultant

DynPro India Pvt Ltd

Pune

12.2022 - Current

Project: Conversational Query Engine (POC- MTS) July 2025 to Present

Role: AI Engineers

Domain: Sales & Inventory Management

Environment: Artificial Intelligence and Gen-AI

Responsibilities:

Natural Language Processing (NLP): Developed a conversational AI system that translated business language into precise SQL queries, enabling non-technical users to access data insights without SQL expertise.
Retrieval-Augmented Generation (RAG): Implemented RAG with Milvus vector database (IP metric) for schema context retrieval, achieving 50–60% token savings and reducing operational costs.
Database Integration: Integrated with Snowflake using secure read-only connections, audit logging, client isolation, and restricted SQL operations, ensuring data security and compliance.
Interactive Visualization: Designed dynamic visualization and intelligent chart recommendations (bar, line, pie, scatter, combo etc.), empowering users with interactive exploration and instant insights.
Performance Optimization: Built query caching mechanisms and execution tracking for bottleneck identification, improving query response speed and retrieval accuracy.
Multi-LLM Architecture: Architected a flexible system supporting multiple LLM providers (Claude, GPT, and Gemini APIs), ensuring resilience, adaptability, and seamless platform integration.
Cost Monitoring: Developed token usage tracking and cost monitoring dashboards to optimize resources and provide transparency in operational expenses.
Scalable Deployment: Deployed via Docker-based architecture, supporting scalability across multiple business domains and growing user bases.

Project: Patient 30-Day Readmission Risk Prediction (POC-DaVita) June 2025

Role: ML-Engineer

Domain: Healthcare Analytics, Predictive Modeling

Environment: Predictive and Data Science

Responsibilities:

Data Integration: Built synthetic patient datasets (demographics, vitals, medications, readmission flags) and integrated Salesforce Health Cloud with Snowflake using secure ETL pipelines.
Data Processing: Designed a layered architecture (Raw, Silver, Gold) with Python ETL and Snowflake, creating a master dataset optimized for ML model training at patient-level granularity.
Feature Engineering: Created dialysis-specific features, grouped diseases, normalized/encoded variables, and derived predictive risk factors.
Modeling: Trained ML models, selected XGBoost, and identified top 15 predictors. Generated patient-level readmission probabilities.
Health Score: Developed a composite 0–100 health score using weighted feature importance (0 = high risk, 100 = low risk).
Data Storage: Stored final results as a comprehensive Snowflake table including patient data, predictions, health scores, and confidence intervals. This dataset was used for ThoughtSpot dashboards to support clinical and business insights.
Optimization: Resolved Salesforce–Snowflake integration challenges and improved pipeline reliability.

AI Engineer

DynPro India Pvt Ltd

Pune

05.2025 - Current

Project: AI-Power Document Data Extraction & Structuring (POC–Revlon) May 25 to June 25

Role: AI Engineer

Domain: Document Intelligence & Generative AI

Environment: Artificial Intelligence and Gen-AI

Responsibilities:

Document Segregation & Clustering: Built a Streamlit-based application to structure multi-format documents (PDF, DOCX, images) into meaningful groups using a two-stage clustering framework:
Stage 1 – Embeddings + DBSCAN + Layout: Generated OpenAI embeddings and applied DBSCAN (scikit-learn, NumPy, Pandas) with layout-based segregation (OpenCV) for initial grouping.
Stage 2 – LLM Refinement: Applied OpenAI LLMs for semantic similarity and hierarchical clustering to refine groups into human-intuitive categories.
Document Parsing Pipeline: Segregated documents were sent to LLMs (Claude, Gemini, GPT) and on-premise LLMs (LLaMA, Mistral via Ollama) to produce a strict JSON schema (section-wise representation).
Comparison Agent: Evaluated JSON outputs from multiple LLMs against original documents using OpenAI API, applying rules on content completeness, structure accuracy, data accuracy, fabrication detection (as error rate), and weighted scoring for final judgment.
Model Performance Metrics: Designed an engine to automatically rank the best-performing LLM per document based on predefined importance weights.
End-to-End System Delivery: Developed backend processing and evaluation logic in Python and frontend visualization in ReactJS, integrating Claude, Gemini, GPT, and on-premise LLMs for real-world document processing.

Associate Data Science Consultant

DynPro India Pvt Ltd

Pune

06.2023 - Current

Project: AHC

Role: Associate Data Science Consultant

Domain: Healthcare Insurance Analytics

Environment: Predictive and Data Science

Responsibilities:

Data Processing: Responsible for processing data received from the data engineering team, ensuring it is prepared for analysis.
Core responsibilities include analysing raw data, conducting quality checks, anomaly detection, feature creation, feature extraction, model development/validation, dashboard creation, and utilizing strong Excel and MS skills for efficient data handling and analysis.
Data Pre-processing: Perform data pre-processing tasks to clean and prepare data for analysis and modeling, enhancing data quality and reliability.
Feature Binning: Group features to calculate event rates, Weight of Evidence (WOE), and Information Value (IV), facilitating statistically insightful analysis.
Machine Learning Modeling: Build predictive models on pre-processed data to calculate health scores and spending amounts of medical data for each patient for the next 12 months, using Python, Snowpark, and Snowflake.
Dashboard Creation: Create dashboards using Looker BI tools to visualize and present analysis and modeling results, providing actionable insights.
Proof of Concept (POC): Conduct POCs to explore and experiment with various machine learning and AI models, aiming to improve accuracy and reliability.
Automation: Automate processes to streamline data processing, modeling, and reporting tasks, increasing efficiency and consistency.
Code Optimization: Focus on restructuring and optimizing code to reduce loading times and improve overall performance.
Developed an LLM-powered (GPT-4) application using structured prompts to generate AI-driven patient health reports, integrating Snowflake for data retrieval and leveraging Python libraries for analysis, visualization, and PDF generation.
Project: Chabot using Generative AI for Pandas and Snowflake June 2023 to September 2023
Role: Data Science Trainee
Domain: Predictive Segmentation
Environment: Segmentation
Responsibilities:
Designed and implemented a Generative AI-driven chatbot to facilitate seamless interaction with Pandas DataFrames and Snowflake SQL, improving data accessibility and analysis.
Developed structured prompt engineering techniques to enhance the chatbot’s efficiency in retrieving, processing, and analyzing data.
Integrated GPT-3.5 to automate data insights generation and visualization, enabling more intuitive and actionable reporting.
Optimized data workflows, reducing costs by 40% through automated query execution and enhanced data processing capabilities.

Associate Data Science Consultant

DynPro India Pvt Ltd

Pune

01.2025 - Current

Project: DynViz (Data Visualization Platform)

Role: Associate Data Science Consultant

Domain: Business Intelligence

Environment: Product Development

Responsibilities:

Created several graph payloads in Go that integrate Chart.js and Google Charts for data visualization.
Coordinated communication and design between front-end and back-end teams.
Developed essential features for creating dynamic dashboards utilizing Go and React.js.

Data Science Intern

Morgan Stanley Capital International (MSCI)

Mumbai

03.2019 - 08.2019

Project: Segmentation base on interactive usages (UI) Mar 2019 to Aug 2019

Role: Data Science Intern

Domain: Financial Services

Environment: Data Science

Responsibilities:

Data Segmentation: Highly unstructured data is segmented based on interactive usage (UI) from clients on the web-based application platform, Barra One. Tools such as Splunk, Python, Excel, and SQL are utilized to extract valuable insights and high-quality information from machine data-related UI actions.
Data Analysis: The segmented data is analyzed to identify trends, patterns, and correlations that have the potential to enhance overall business performance.
Use Cases: Various use cases are implemented, such as determining portfolio loading time (Attributes multiplied by the Total number column), identifying Akamai IP addresses, and tracking time spent on the application. These insights support monitoring and optimization efforts.
Predictive Analysis: In a separate project, Python and SQL are applied to analyze monthly client portfolio usage. Linear regression techniques are employed to predict the month and client with the highest portfolio load.
The role focuses on leveraging data-driven insights and machine learning to optimize business performance and enhance the monitoring of the company's web-based application platform, Barra One.

Education

Bachelor of Engineering - Electronics and Telecommunications

Skills

Data Science: Machine Learning, Predictive Modeling, Data Analysis, Supervisor Machine Learning, Unsupervised Machine Learning, Classification, Clustering, EDA, Segmentation, KNN, Ensemble Modelling, PCA, Ridge and Lasso regression, Elastic Net, Clustering algorithms (K-Means, DBSCAN, Agglomerative Clustering), XGBOOST
Gen-AI: PE (Prompt Engineering), RAG, RAG-RR (RAG with Re-Ranking), LLMs, ST (Sentence Transformers), OS LLMs (Llama, Mistral, Gamma), ENT LLMs (Gemini, Claude, GPT), VDBs (Milvus, Chroma-DB)
Python: SK-Learn, Pandas, NumPy, Matplotlib, Seaborn, Statsmodels, LightGBM, Keras

Statistics: Hypotheses Testing, Sampling, Outliers, ANOVA, Descriptive Statistics, Correlation, Regression Model, Probability, Distribution, T-test
Database: Snowflake, MySQL, PostgreSQL
Other Tools: SQL, Looker, AWS, Snowflake, Snowpark, Generative AI, Prompt Engineering, GCP (Vertex AI and BigQuery), Docker, R and Excel

Certification

AWS Cloud Practitioner (CLF-02)
Data Science Pro-degree in collaboration with Genpact (Data Science, Machine Learning, Python, R, Base SAS) - Imarticus Learning, Mumbai, India. (2018-2019)
Python Programming- From Basics to Advanced Level [UDEMY]

Timeline

AI Engineer

DynPro India Pvt Ltd

05.2025 - Current

Associate Data Science Consultant

DynPro India Pvt Ltd

01.2025 - Current

Associate Data Science Consultant

DynPro India Pvt Ltd

06.2023 - Current

Data Science Consultant

DynPro India Pvt Ltd

12.2022 - Current

Data Science Intern

Morgan Stanley Capital International (MSCI)

03.2019 - 08.2019