Available for Senior AI/ML Roles — Open to Relocation

Focus: Multi-Agent LLM Orchestration & Production RAG

AI/ML Engineer
Shipping Production Systems

4+ years shipping production LLM systems — multi-agent orchestration, enterprise RAG, and LLM fine-tuning with responsible-AI governance on AWS. Patent-pending multimodal AI. 2 peer-reviewed publications.

Currently building:

View Projects Resume harshrofff@gmail.com

Sub-8s Latency
Multi-Agent Reasoning

Time Reduction
LLM-Driven Automation

Docs in Production RAG
pgvector · Claude on Bedrock

Publications
IEEE + Elsevier

// about

Who I Am

I'm an AI/ML Engineer who specializes in designing and deploying production-grade AI systems for regulated industries. My work spans architecting multi-agent LLM pipelines on AWS Bedrock, building neuro-symbolic reasoning systems, and shipping computer vision to edge hardware.

MS in Data Science from UMBC (3.8 GPA), with active research under US Army Research Lab funding in distributed sensing and autonomous systems. I've co-authored two peer-reviewed papers (IEEE + Elsevier), hold an AWS Machine Learning Associate certification, and care deeply about AI safety, auditability, and production observability.

I believe the gap between "demo" and "deployed" is where real engineering happens. That's where I work.

MS Data Science

UMBC — GPA 3.8/4.0

AWS ML Certified

Active through Apr 2028

2 Peer-Reviewed Papers

IEEE ITU + Elsevier JAFR

Army Research Lab

Funded Research · UMBC CARDS Lab

// skills

Technical Stack

LLM Stack

Multi-Agent Systems RAG Architectures Neuro-Symbolic AI LLM Fine-Tuning (LoRA/QLoRA) LangGraph PydanticAI LangChain LlamaIndex vLLM Prompt Engineering LLM-as-Judge RLHF Claude 3.5 Gemini 2.0 Llama 3.1 OpenAI API Ollama

Core AI / ML

PyTorch TensorFlow Vision Transformers (ViT) DINOv2 YOLOv8 Computer Vision Transformer Architectures OpenCV Semantic Search Embedding Pipelines A/B Testing Anomaly Detection Time Series

Cloud & MLOps

AWS Bedrock SageMaker Lambda Model Governance EC2 S3 Textract TensorRT CUDA MLflow Weights & Biases Docker GitHub Actions Hallucination Mitigation Bias Detection Production Observability

Data Engineering

pgvector Semantic Search Pandas NumPy Apache Spark Kafka PostgreSQL Redis Pinecone ETL/ELT Pipelines Feature Engineering Embedding Pipelines Data Modeling

Software & Deploy

Python FastAPI Flask REST APIs SQL Git Linux/Bash Supabase APScheduler Streamlit NVIDIA Jetson Edge Computing Raspberry Pi

// featured

Production AI Systems

End-to-end engineering — not tutorials.

MULTI-AGENT LLM · OPEN SOURCE

MARS — Multi-Agent Research System

Orchestrates 11 specialized LLM agents through a LangGraph state machine to produce citation-grounded research reports — with a self-correcting QC loop, parallel critic agents, and MCP-based retrieval across ArXiv, the web, and Wikipedia.

Problem Research agents hallucinate sources and stop at the first draft — no grounding, no self-correction.

Approach ReAct researcher (MCP tools) → structured analyst → 3 parallel critics → synthesizer → QC gate that retries with a new search strategy when the score is too low.

Outcome Every claim tagged to a retrieved chunk; a Citation Verifier blocks broken references before the report ships. Measured across 15 eval runs.

Researcher (MCP) Analyst Parallel Critics ×3 Synthesizer QC Gate (retry)

87% QC first-pass 0% citation-tag errors ~$0.0028 / report 11 agents

LangGraph Gemini 2.5 Flash PydanticAI MCP Streamlit

View Code

NEURO-SYMBOLIC AI

Bio-Oracle

Neuro-symbolic agent for high-throughput drug-discovery screening — pairs Cellpose cell segmentation with PydanticAI structured reasoning (Gemini) for verifiable, auditable phenotypic analysis where raw deep learning fails interpretability.

Approach Cellpose segmentation → 30+ morphological features → robust Z-score normalization → PydanticAI + Gemini reasoning agent.

Outcome Verifiable reasoning over the BBBC021 benchmark; ~90 cells/sec on Apple Silicon (MPS), reproducible in Docker.

PydanticAI Gemini Cellpose Docker Apple MPS

View Code

RAG · SEMANTIC SEARCH · PROD

Enterprise RAG Platform

Production-grade semantic search system processing 2,500+ regulatory documents with pgvector + Claude for compliance-critical retrieval.

Problem Manual compliance review of 2,500+ unstructured municipal/regulatory PDFs blocked workflows.

Approach AWS Textract → pgvector semantic indexing → Claude 3.5 (Bedrock) inference → SOC2 audit trails.

Outcome 60% time reduction (25→10 hrs/wk), 100% audit coverage, zero hallucinations on compliance queries.

AWS Bedrock pgvector Semantic Search EC2 Streamlit

Proprietary (VITG Corp)

CV · PUBLISHED

FilletCam AI

Problem Manual fish fillet grading is slow and inconsistent on production lines.

Approach YOLOv8 + OpenCV deployed on Raspberry Pi for real-time edge inference.

Outcome 92% accuracy. Published in Elsevier JAFR 2024.

YOLOv8 OpenCV Raspberry Pi

Elsevier JAFR

EDGE AI · PUBLISHED

Mosquito Wing Beat Classification

Problem Species-level mosquito ID needed in resource-constrained field environments.

Approach TinyML audio classifier on ARM Cortex-M — no network, no GPU.

Outcome <100ms inference. Published IEEE ITU Kaleidoscope 2021.

ARM Cortex-M TinyML Python

IEEE ITU

AI PLATFORM · LIVE DEMO

Silicon Oracle

Full-stack AI stock analysis & paper trading platform. BYOK architecture — each user supplies their own API keys, so rate limits never hit everyone at once. Real-time market data, Oracle Score™ (15-factor analysis), Gemini-powered email intelligence, and Alpaca paper trading.

Problem Retail investors lack professional-grade AI analysis without expensive subscriptions or shared rate-limit bottlenecks.

Approach BYOK Flask platform: Finnhub market data → Oracle Score™ (15-factor system) → Gemini 2.0 Flash AI analysis → Alpaca paper trading → automated hourly email intelligence alerts.

Outcome Live multi-user platform with real-time analysis, hourly AI market intelligence, paper trading, and portfolio tracking — deployed free on Render.

Flask Gemini 2.0 Flash PostgreSQL Alpaca API Supabase Tailwind CSS APScheduler

Live Demo — No Login View Code

// projects

More Work

AI OpenAI

Multi-Modal Recommendation Engine

Hybrid recommendation system combining structured data, computer vision, and LLM personalization. A/B tested for continuous improvement.

−20% bounce • +40% engagement

Python · OpenAI API · PostgreSQL · A/B Testing

Data GeoSpatial

Infrastructure Risk Assessment

Real-time geospatial risk analytics for transportation infrastructure. Multi-source data fusion, interactive Leaflet maps, PostgreSQL/PostGIS backend, color-coded risk overlays for decision support.

Flask · PostgreSQL · PostGIS · Leaflet

AI Gemini

SeatSniper

Real-time study-spot finder for UMBC commuters. Crowdsourced availability syncs live across users via Firebase, with a Gemini "Gap Optimizer" that turns your class schedule into a full study-day plan.

React · Vite · Firebase RTDB · Gemini 2.0 Flash

AI Voice

PelloHorter

AI phone agent making autonomous outbound screening calls — real-time voice synthesis and LLM dialogue management for human-like conversational flow at scale.

Voice AI · LLM · Speech Processing · Real-time

CV Edge

Gesture-Based Control System

Real-time gesture recognition for robotic system control. Deployed on NVIDIA Jetson with CUDA-optimized inference for low-latency embedded operation in field environments.

Python · OpenCV · NVIDIA Jetson · CUDA

// experience

Where I've Worked

Mar 2023 — Present (Concurrent) Active

Applied AI Research Engineer

UMBC Center for Real-time Distributed Sensing and Autonomy

Architected a capability-based multi-agent AI system deploying quantized Llama 3.1 via Ollama, achieving sub-8s end-to-end latency for autonomous reasoning and voice-controlled execution — Demo ↗
Fine-tuned open-weight LLMs (LoRA/QLoRA) for domain-specific intent classification, with LLM-as-Judge evaluation and automated quality gating to cut hallucinations before deployment
Invented a patent-pending multimodal AI assistant synthesizing RGB, thermal, and audio sensor fusion under US Army Research Lab funding, improving real-time optimization by 20%
Engineered TensorRT-accelerated inference reaching 50+ FPS object detection with zero-latency scene understanding on NVIDIA Jetson

Llama 3.1 Ollama TensorRT NVIDIA Jetson PyTorch

Jun 2024 — May 2025

AI/ML Engineer — Production Systems

VITG Corp., Halethorpe MD

Architected and shipped a public-facing semantic RAG platform processing 2,500+ regulatory PDFs with pgvector indexing and Claude 3.5 (AWS Bedrock) for high-accuracy compliance retrieval
Engineered a serverless LLM automation system (Lambda + Bedrock) for internal HR workflows, reducing manual screening from 25 hrs/week → 10 hrs/week (60% reduction) while maintaining quality
Established enterprise responsible AI governance: SOC2-compliant audit logging, automated hallucination mitigation, bias detection pipelines, and production monitoring
Designed and executed offline A/B evaluation pipelines (MLflow) measuring retrieval quality and P50/P99 latency for data-driven RAG iteration

AWS Bedrock pgvector Lambda EC2 MLflow

Aug 2023 — Dec 2023

Data Scientist Intern — Computer Vision

The Conservation Fund, Shepherdstown WV

Built YOLOv8 + OpenCV pipeline achieving 92% accuracy — deployed on Raspberry Pi for real-time edge quality assessment
Co-authored peer-reviewed research; published in Elsevier Journal of Agriculture and Food Research (2024)

YOLOv8 OpenCV Raspberry Pi Edge ML

May 2021 — Jun 2022

ML Engineer — Applied AI & Audio Analytics

WeHear Innovations Pvt. Ltd., Ahmedabad India

Developed Personal Hearing Intelligence (PHI) ML models — longitudinal audio analysis for personalized hearing-risk metrics on bone-conduction hardware
Built time-series feature engineering pipelines for high-frequency sensor data; early-risk detection through statistical monitoring
Optimized audio signal processing workflows for bone-conduction hardware, implementing low-latency ML inference to support real-time voice amplification features

Python Time Series Audio Processing Mobile Integration

Master of Science, Data Science

University of Maryland Baltimore County (UMBC)

Aug 2022 – May 2024 • GPA: 3.8 / 4.0

ML · Deep Learning · NLP · Computer Vision · Big Data Systems · Statistical Analysis

// research

Publications & Credentials

Peer-Reviewed Publications

Elsevier · Journal of Agriculture and Food Research · 2024

FilletCam AI: Precision color profiling of fish fillets using deep learning

Ranjan, R., Shroff, H., et al.

YOLOv8 + OpenCV pipeline for automated fish fillet quality grading deployed on edge devices, achieving 92% accuracy. Enabled commercial AI quality control systems.

DOI: 10.1016/j.jafr.2024.101461

IEEE · ITU Kaleidoscope Conference · 2021

Mosquito identification using machine learning on embedded systems

Trivedi, K., Shroff, H.

TinyML pipeline for real-time mosquito wing beat classification on ARM Cortex-M microcontrollers with sub-100ms inference latency for field deployment.

DOI: 10.23919/ITUK53220.2021.9662116

Certifications

Machine Learning Associate

Amazon Web Services · Active through Apr 2028

Transformer-Based NLP Applications

NVIDIA Deep Learning Institute

GPU-Accelerated Computing