Available for Senior AI/ML Roles — Open to Relocation
Focus: Multi-Agent LLM Orchestration & Production RAG

AI/ML Engineer
Shipping Production Systems

4+ years shipping production LLM systems — multi-agent orchestration, enterprise RAG, and LLM fine-tuning with responsible-AI governance on AWS. Patent-pending multimodal AI. 2 peer-reviewed publications.

Currently building:
0s
Sub-8s Latency
Multi-Agent Reasoning
0%
Time Reduction
LLM-Driven Automation
0+
Docs in Production RAG
pgvector · Claude on Bedrock
0
Publications
IEEE + Elsevier

Who I Am

Harsh Shroff — AI/ML Engineer

I'm an AI/ML Engineer who specializes in designing and deploying production-grade AI systems for regulated industries. My work spans architecting multi-agent LLM pipelines on AWS Bedrock, building neuro-symbolic reasoning systems, and shipping computer vision to edge hardware.

MS in Data Science from UMBC (3.8 GPA), with active research under US Army Research Lab funding in distributed sensing and autonomous systems. I've co-authored two peer-reviewed papers (IEEE + Elsevier), hold an AWS Machine Learning Associate certification, and care deeply about AI safety, auditability, and production observability.

I believe the gap between "demo" and "deployed" is where real engineering happens. That's where I work.

MS Data Science
UMBC — GPA 3.8/4.0
AWS ML Certified
Active through Apr 2028
2 Peer-Reviewed Papers
IEEE ITU + Elsevier JAFR
Army Research Lab
Funded Research · UMBC CARDS Lab

Technical Stack

LLM Stack

Multi-Agent Systems RAG Architectures Neuro-Symbolic AI LLM Fine-Tuning (LoRA/QLoRA) LangGraph PydanticAI LangChain LlamaIndex vLLM Prompt Engineering LLM-as-Judge RLHF Claude 3.5 Gemini 2.0 Llama 3.1 OpenAI API Ollama

Core AI / ML

PyTorch TensorFlow Vision Transformers (ViT) DINOv2 YOLOv8 Computer Vision Transformer Architectures OpenCV Semantic Search Embedding Pipelines A/B Testing Anomaly Detection Time Series

Cloud & MLOps

AWS Bedrock SageMaker Lambda Model Governance EC2 S3 Textract TensorRT CUDA MLflow Weights & Biases Docker GitHub Actions Hallucination Mitigation Bias Detection Production Observability

Data Engineering

pgvector Semantic Search Pandas NumPy Apache Spark Kafka PostgreSQL Redis Pinecone ETL/ELT Pipelines Feature Engineering Embedding Pipelines Data Modeling

Software & Deploy

Python FastAPI Flask REST APIs SQL Git Linux/Bash Supabase APScheduler Streamlit NVIDIA Jetson Edge Computing Raspberry Pi

More Work

AI OpenAI

Multi-Modal Recommendation Engine

Hybrid recommendation system combining structured data, computer vision, and LLM personalization. A/B tested for continuous improvement.

−20% bounce  •  +40% engagement
Python · OpenAI API · PostgreSQL · A/B Testing
Data GeoSpatial

Infrastructure Risk Assessment

Real-time geospatial risk analytics for transportation infrastructure. Multi-source data fusion, interactive Leaflet maps, PostgreSQL/PostGIS backend, color-coded risk overlays for decision support.

Flask · PostgreSQL · PostGIS · Leaflet
AI Gemini

SeatSniper

Real-time study-spot finder for UMBC commuters. Crowdsourced availability syncs live across users via Firebase, with a Gemini "Gap Optimizer" that turns your class schedule into a full study-day plan.

React · Vite · Firebase RTDB · Gemini 2.0 Flash
AI Voice

PelloHorter

AI phone agent making autonomous outbound screening calls — real-time voice synthesis and LLM dialogue management for human-like conversational flow at scale.

Voice AI · LLM · Speech Processing · Real-time
CV Edge

Gesture-Based Control System

Real-time gesture recognition for robotic system control. Deployed on NVIDIA Jetson with CUDA-optimized inference for low-latency embedded operation in field environments.

Python · OpenCV · NVIDIA Jetson · CUDA

Where I've Worked

Mar 2023 — Present (Concurrent) Active

Applied AI Research Engineer

UMBC Center for Real-time Distributed Sensing and Autonomy
  • Architected a capability-based multi-agent AI system deploying quantized Llama 3.1 via Ollama, achieving sub-8s end-to-end latency for autonomous reasoning and voice-controlled execution — Demo ↗
  • Fine-tuned open-weight LLMs (LoRA/QLoRA) for domain-specific intent classification, with LLM-as-Judge evaluation and automated quality gating to cut hallucinations before deployment
  • Invented a patent-pending multimodal AI assistant synthesizing RGB, thermal, and audio sensor fusion under US Army Research Lab funding, improving real-time optimization by 20%
  • Engineered TensorRT-accelerated inference reaching 50+ FPS object detection with zero-latency scene understanding on NVIDIA Jetson
Llama 3.1 Ollama TensorRT NVIDIA Jetson PyTorch
Jun 2024 — May 2025

AI/ML Engineer — Production Systems

VITG Corp., Halethorpe MD
  • Architected and shipped a public-facing semantic RAG platform processing 2,500+ regulatory PDFs with pgvector indexing and Claude 3.5 (AWS Bedrock) for high-accuracy compliance retrieval
  • Engineered a serverless LLM automation system (Lambda + Bedrock) for internal HR workflows, reducing manual screening from 25 hrs/week → 10 hrs/week (60% reduction) while maintaining quality
  • Established enterprise responsible AI governance: SOC2-compliant audit logging, automated hallucination mitigation, bias detection pipelines, and production monitoring
  • Designed and executed offline A/B evaluation pipelines (MLflow) measuring retrieval quality and P50/P99 latency for data-driven RAG iteration
AWS Bedrock pgvector Lambda EC2 MLflow
Aug 2023 — Dec 2023

Data Scientist Intern — Computer Vision

The Conservation Fund, Shepherdstown WV
  • Built YOLOv8 + OpenCV pipeline achieving 92% accuracy — deployed on Raspberry Pi for real-time edge quality assessment
  • Co-authored peer-reviewed research; published in Elsevier Journal of Agriculture and Food Research (2024)
YOLOv8 OpenCV Raspberry Pi Edge ML
May 2021 — Jun 2022

ML Engineer — Applied AI & Audio Analytics

WeHear Innovations Pvt. Ltd., Ahmedabad India
  • Developed Personal Hearing Intelligence (PHI) ML models — longitudinal audio analysis for personalized hearing-risk metrics on bone-conduction hardware
  • Built time-series feature engineering pipelines for high-frequency sensor data; early-risk detection through statistical monitoring
  • Optimized audio signal processing workflows for bone-conduction hardware, implementing low-latency ML inference to support real-time voice amplification features
Python Time Series Audio Processing Mobile Integration
Master of Science, Data Science
University of Maryland Baltimore County (UMBC)
Aug 2022 – May 2024  •  GPA: 3.8 / 4.0
ML · Deep Learning · NLP · Computer Vision · Big Data Systems · Statistical Analysis

Publications & Credentials

Peer-Reviewed Publications

Elsevier · Journal of Agriculture and Food Research · 2024

FilletCam AI: Precision color profiling of fish fillets using deep learning

Ranjan, R., Shroff, H., et al.

YOLOv8 + OpenCV pipeline for automated fish fillet quality grading deployed on edge devices, achieving 92% accuracy. Enabled commercial AI quality control systems.

DOI: 10.1016/j.jafr.2024.101461
IEEE · ITU Kaleidoscope Conference · 2021

Mosquito identification using machine learning on embedded systems

Trivedi, K., Shroff, H.

TinyML pipeline for real-time mosquito wing beat classification on ARM Cortex-M microcontrollers with sub-100ms inference latency for field deployment.

DOI: 10.23919/ITUK53220.2021.9662116

Certifications

Machine Learning Associate
Amazon Web Services · Active through Apr 2028
Transformer-Based NLP Applications
NVIDIA Deep Learning Institute
GPU-Accelerated Computing
NVIDIA Deep Learning Institute

Online Presence

// contact

Let's Build Something

Open to Senior AI/ML Engineer roles.
Available immediately — open to relocation.