Hello, I'm
Boston, MA
Engineering background. ML focus. Building systems that work in production, not just notebooks.
I'm a recent Industrial Engineering graduate with a background in applied ML, looking for AI, ML, or data science roles where the work is connected to real decisions and real constraints. My path into ML didn't follow a standard CS track. I studied Chemical Engineering at NIT Bhopal, then came to the U.S. on a fully funded scholarship tied to the IE department. That structure shaped how I learned: my coursework was grounded in statistics, optimization, and systems thinking rather than pure computer science.
I built my ML skills through projects connected to real domains — energy, operations, and security — where model outputs directly affect costs and risk. That context changed how I think about the work. I spend as much time thinking about how a model might fail, drift, or be misused as I do about how to train it. I care about data quality, cost-aware evaluation, interpretability, and what happens after deployment, not just benchmark numbers.
Across my projects I've built end-to-end pipelines covering ingestion, feature engineering, training, evaluation, and production deployment. I've worked with both classical and modern ML tooling, and I try to make trade-offs explicit rather than defaulting to complexity. I'm most useful on teams where ML is embedded in a larger operational system and needs to be trusted and maintained over time.
Production hybrid RAG system that ingests daily arXiv CS.AI papers via an automated Airflow pipeline, indexes them with BM25 + 1024-dim HNSW vector search (Reciprocal Rank Fusion), and serves Q&A via a streaming FastAPI backend with multi-turn Redis session memory and full Langfuse observability tracing. The hardest part was tuning RRF weight balance — BM25 dominated on keyword-heavy queries while vector search won on conceptual ones, so neither alone was reliable. Exact-match caching on query embeddings (6h TTL) cut repeat-query latency by 145x without touching the retrieval stack.
Multi-agent supply chain decision system with guarded orchestration, domain-specialized agents, and tool-scoped execution built with LangGraph and LangChain. The main challenge was preventing agents from calling tools outside their domain — a routing bug early on let the inventory agent trigger procurement actions, which produced confident but wrong outputs. Fixing that required explicit state guards and a supervisor layer to validate transitions before any tool executed.
Applied the CRISP-DM framework to analyze OPC-UA industrial sensor data across 27 production shifts, engineering time-series features (Total Active Energy, Active Power L2) and implementing DTW-based Time-Series KMeans to identify anomalous operating regimes linked to energy inefficiency and quality defects.
Built end-to-end ML pipelines on 2,500+ U.S. counties and 300+ features to predict food insecurity, diabetes prevalence (regression), and obesity hotspots (classification) using Gradient Boosting and Logistic Regression. Designed robust preprocessing and validated with nested cross-validation and bootstrap uncertainty analysis.
Developed a robust multivariate statistical process control (MSPC) framework on 552 manufacturing records with 209 variables. Used PCA and robust outlier detection to isolate a stable in-control baseline, then deployed a Phase II Hotelling's T² monitoring scheme for real-time anomaly detection.
Open to AI engineering, ML, and data science roles. I'm actively looking and happy to chat about what you're working on.