Hello, I'm
Boston, MA Open to work
My path into AI ran through industrial engineering, not computer science. I build ML systems that have to hold up in production — not just score well on a benchmark.
I'm a recent Industrial Engineering graduate with a background in applied ML, looking for AI, ML, or data science roles where the work is connected to real decisions and real constraints. My path into ML didn't follow a standard CS track. I studied Chemical Engineering at NIT Bhopal, then came to the U.S. on a fully funded scholarship tied to the IE department. That structure shaped how I learned: my coursework was grounded in statistics, optimization, and systems thinking rather than pure computer science.
I built my ML skills through projects connected to real domains energy, operations, and security, where model outputs directly affect costs and risk. That context changed how I think about the work. I spend as much time thinking about how a model might fail, drift, or be misused as I do about how to train it. I care about data quality, cost-aware evaluation, interpretability, and what happens after deployment, not just benchmark numbers.
Across my projects I've built end-to-end pipelines covering ingestion, feature engineering, training, evaluation, and production deployment. I've worked with both classical and modern ML tooling, and I try to make trade-offs explicit rather than defaulting to complexity. I'm most useful on teams where ML is embedded in a larger operational system and needs to be trusted and maintained over time.
These are the systems I've built to put that thinking into practice — end to end, in real domains.
Production hybrid RAG system that ingests daily arXiv CS.AI papers via an automated Airflow pipeline, indexes them with BM25 + 1024-dim HNSW vector search (Reciprocal Rank Fusion), and serves Q&A via a streaming FastAPI backend with multi-turn Redis session memory and full Langfuse observability tracing. The hardest part was tuning RRF weight balance — BM25 dominated on keyword-heavy queries while vector search won on conceptual ones, so neither alone was reliable. Exact-match caching on query embeddings (6h TTL) cut repeat-query latency by 145x without touching the retrieval stack.
Multi-agent supply chain decision system with guarded orchestration, domain-specialized agents, and tool-scoped execution built with LangGraph and LangChain. The main challenge was preventing agents from calling tools outside their domain — a routing bug early on let the inventory agent trigger procurement actions, which produced confident but wrong outputs. Fixing that required explicit state guards and a supervisor layer to validate transitions before any tool executed.
Applied the CRISP-DM framework to analyze OPC-UA industrial sensor data across 27 production shifts, engineering time-series features (Total Active Energy, Active Power L2) and implementing DTW-based Time-Series KMeans to identify anomalous operating regimes linked to energy inefficiency and quality defects.
Built end-to-end ML pipelines on 2,500+ U.S. counties and 300+ features to predict food insecurity, diabetes prevalence (regression), and obesity hotspots (classification) using Gradient Boosting and Logistic Regression. Designed robust preprocessing and validated with nested cross-validation and bootstrap uncertainty analysis.
Developed a robust multivariate statistical process control (MSPC) framework on 552 manufacturing records with 209 variables. Used PCA and robust outlier detection to isolate a stable in-control baseline, then deployed a Phase II Hotelling's T² monitoring scheme for real-time anomaly detection.
Open to AI engineering, ML, and data science roles. I'm actively looking and happy to chat about what you're working on.