Hello, I'm

Akshay Patel

|

Boston, MA

Engineering background. ML focus. Building systems that work in production, not just notebooks.

0ms Hybrid Search Latency
0x Cache Speedup vs Full LLM Call
0ms Query Embedding Round-Trip
0 Docker Services Orchestrated

About Me

Akshay Patel

I'm a recent Industrial Engineering graduate with a background in applied ML, looking for AI, ML, or data science roles where the work is connected to real decisions and real constraints. My path into ML didn't follow a standard CS track. I studied Chemical Engineering at NIT Bhopal, then came to the U.S. on a fully funded scholarship tied to the IE department. That structure shaped how I learned: my coursework was grounded in statistics, optimization, and systems thinking rather than pure computer science.

I built my ML skills through projects connected to real domains — energy, operations, and security — where model outputs directly affect costs and risk. That context changed how I think about the work. I spend as much time thinking about how a model might fail, drift, or be misused as I do about how to train it. I care about data quality, cost-aware evaluation, interpretability, and what happens after deployment, not just benchmark numbers.

Across my projects I've built end-to-end pipelines covering ingestion, feature engineering, training, evaluation, and production deployment. I've worked with both classical and modern ML tooling, and I try to make trade-offs explicit rather than defaulting to complexity. I'm most useful on teams where ML is embedded in a larger operational system and needs to be trusted and maintained over time.

Projects

2025

arXiv RAG Curator

Production hybrid RAG system that ingests daily arXiv CS.AI papers via an automated Airflow pipeline, indexes them with BM25 + 1024-dim HNSW vector search (Reciprocal Rank Fusion), and serves Q&A via a streaming FastAPI backend with multi-turn Redis session memory and full Langfuse observability tracing. The hardest part was tuning RRF weight balance — BM25 dominated on keyword-heavy queries while vector search won on conceptual ones, so neither alone was reliable. Exact-match caching on query embeddings (6h TTL) cut repeat-query latency by 145x without touching the retrieval stack.

72ms hybrid search 145x cache speedup ~370ms query embedding 600-word chunks, 100-word overlap
RAG FastAPI OpenSearch Airflow Redis Langfuse Docker Ollama
2025

Supply Chain Intelligence Agents

Multi-agent supply chain decision system with guarded orchestration, domain-specialized agents, and tool-scoped execution built with LangGraph and LangChain. The main challenge was preventing agents from calling tools outside their domain — a routing bug early on let the inventory agent trigger procurement actions, which produced confident but wrong outputs. Fixing that required explicit state guards and a supervisor layer to validate transitions before any tool executed.

LangGraph state machine Domain-specialized agents Guarded tool execution
LangChain LangGraph Streamlit Python Docker Agents
October 2024

Time Series Clustering for Industrial Energy Optimization

Applied the CRISP-DM framework to analyze OPC-UA industrial sensor data across 27 production shifts, engineering time-series features (Total Active Energy, Active Power L2) and implementing DTW-based Time-Series KMeans to identify anomalous operating regimes linked to energy inefficiency and quality defects.

Silhouette: 0.65–0.66 Calinski-Harabasz: up to 82 27 production shifts
Time Series DTW KMeans CRISP-DM Python
November 2025

County-Level Food & Health Outcomes Modeling

Built end-to-end ML pipelines on 2,500+ U.S. counties and 300+ features to predict food insecurity, diabetes prevalence (regression), and obesity hotspots (classification) using Gradient Boosting and Logistic Regression. Designed robust preprocessing and validated with nested cross-validation and bootstrap uncertainty analysis.

2,500+ counties 300+ features Nested CV + Bootstrap
Gradient Boosting Logistic Regression Public Health Python
November 2025

Multivariate Quality Control & Anomaly Detection

Developed a robust multivariate statistical process control (MSPC) framework on 552 manufacturing records with 209 variables. Used PCA and robust outlier detection to isolate a stable in-control baseline, then deployed a Phase II Hotelling's T² monitoring scheme for real-time anomaly detection.

85% dimensionality reduction 209 → 46 PCs 68 anomalies (12.3%)
PCA Hotelling's T² SPC Anomaly Detection Python

Skills

LLM & RAG

RAG Hybrid Search BM25 Vector Search RRF Ollama Jina AI LangChain LangGraph OpenSearch Prompt Engineering

Backend & APIs

FastAPI Python Pydantic Async Python Server-Sent Events SQLAlchemy

Infrastructure & DevOps

Docker Docker Compose Apache Airflow GitHub Actions uv Multi-stage Builds

ML & Data Science

PyTorch Scikit-learn XGBoost HuggingFace Pandas NumPy Time Series Classification Regression PCA Hypothesis Testing

Observability & Quality

Langfuse MLflow pytest Async Testing Mypy Ruff Pre-commit Hooks

Databases & Search

PostgreSQL OpenSearch Redis SQLAlchemy ORM Alembic Vector Stores

Experience

Aug 2025 — Dec 2025

ML Engineer — NLP & Applied AI

Data Driven WV, Morgantown, WV

  • Built an NLP-based compliance automation PoC for a Fortune 500 government services org, from requirements gathering to system design validation with the client cybersecurity team.
  • Engineered a hybrid NLP pipeline (MiniLM-L6-v2 sentence transformers + rule-based detection) aligned with NIST 800-53 and STIG standards for automated log classification.
  • Presented results to 20+ executive stakeholders, securing approval to transition the solution to in-house production.
  • Tuned precision-recall thresholds (0.15–0.70) to balance false-positive risk under regulatory compliance constraints.
Jan 2024 — Dec 2025

Graduate Research Assistant (Data Scientist — Energy Analyst)

WVU IMSE Pollution Prevention Group, Morgantown, WV

  • Led energy assessments across 9 industrial facilities, conducting on-site data collection and stakeholder interviews to drive data-driven efficiency recommendations.
  • Built baseline consumption models from facility-level energy data (load profiles, equipment inventories, operating schedules) and evaluated retrofit scenarios per ASHRAE standards, identifying $150K+ in annual cost-reduction opportunities.
  • Performed sensitivity analysis, ROI, and payback calculations, estimating 120 ton/yr CO₂ reduction and presenting investment recommendations to executive decision-makers.
  • Developed Power BI/Excel dashboards and technical reports that contributed to $1M+ in successful USDA REAP grant applications.
  • Delivered technical webinars on energy-saving strategies to 50+ non-technical stakeholders.

Education

Master of Science in Industrial Engineering

West Virginia University, Morgantown, WV

January 2024 — December 2025
Relevant Coursework: Machine Learning, Design of Experiments

Bachelor of Technology in Chemical Engineering

Maulana Azad National Institute of Technology (MANIT), Bhopal, India

August 2019 — May 2023

Get in Touch

Open to AI engineering, ML, and data science roles. I'm actively looking and happy to chat about what you're working on.