Active across 50+ public developer, AI, open source, and research surfaces
Code, datasets, package registries, preprints, reproducible demos, agent workflows, and OSS contribution traces.
Open Source · Publications · Recently Shipped · Packages · Projects · Impact · Experience · Stats
8+ years building production systems at Fortune 100 scale
Former SDE at Amazon Web Services • Currently at Southwest Airlines
Deep expertise in ML systems, distributed architectures, and full-stack engineering
Now: shipped ragdrift (five-dimensional RAG drift detection on crates.io + PyPI), the rust-llm-stack of 5 small Rust crates, and the mcp-stack of 14 MCP servers in the official MCP Registry — 4 RAG/agent helpers + 10 reliable transforms LLMs reach for tools instead of imagining (CSV, regex, JMESPath, diff, SQL formatting, shell escaping, JSON5, TOML/YAML/JSON, IANA timezones, HTML→Markdown). Plus the @mukundakatta/agent* reliability stack (fit → guard → snap → vet → cast), 6 earlier MCP servers (also in the Registry), GitHub Actions on the Marketplace, 53 PyPI packages, and 320+ open PRs across MCP SDKs, FastMCP, claude-code-action, and Anthropic's agent SDK (140 already merged upstream).
|
PUBLIC REPOS 797 |
ORIGINALS 322 |
ACTIVE PROJECTS 284 |
FORKS 475 |
ARCHIVED 308 |
Every repo is indexed in claude-workspace - wired for Multica, Claude Code, Codex, OpenClaw, and Cursor to reason across the portfolio.
The profile README is partly managed by scheduled automation. Pull requests run profile checks that compile the Python refresh scripts and verify the managed README markers plus the cached stats files before changes merge.
🌐 Live at mukundakatta.github.io/agent-stack - single landing page for the whole 119-package ecosystem (npm + PyPI + MCP Registry + GitHub Marketplace).
🤗 Try it live on the HuggingFace Space · jailbreak fixtures on the HF Dataset.
Five small, focused npm packages that fix the boring problems every long-running agent eventually hits. Pure ESM JavaScript, zero runtime deps, TypeScript types in the box. Designed to compose into a pipeline:
fit → guard → snap → vet → cast.
| Surface | Latest proof |
|---|---|
| OpenAI GPT Store | Agent Eval Lab - public GPT for lightweight agent evaluation and scenario walkthroughs |
| Poe | AgentEvalLab - public bot for agent-eval prompts and scoring flows |
| Poe | OpsScorecardLab - public bot for turning eval scenarios into operations scorecards |
| Poe | RepoLandscapeLab - public bot for mapping premium agent repo surfaces |
| Replicate | agent-eval-lab - public model/app page for eval-oriented agent interactions |
| Replicate | ops-scorecard-lab - public app page for ops scorecard generation |
| Replicate | repo-landscape-lab - public app page for repo-surface mapping |
| Hugging Face | Agent Labs Portfolio - curated collection tying together the live Spaces and datasets |
| Hugging Face | Ops Scorecard Lab - public Space for turning rough workflows into operator-facing scorecards |
| Modal | agent-eval-lab endpoint - live API endpoint returning structured eval JSON |
| Modal | ops-scorecard-lab endpoint - live API endpoint for scorecard generation |
| Modal | repo-landscape-lab endpoint - live API endpoint for repo-landscape mapping |
| Modal | Agent Labs Portal - public two-panel demo surface for evaluation plans and ops scorecards |
| OpenRouter | Agent Eval Lab - public OpenRouter app analytics page seeded from the live Hugging Face Space |
| OpenRouter | Ops Scorecard Lab - public OpenRouter app analytics page for the scorecard Space |
| Netlify | Agent Eval Lab Static Portal - verified public portal for the agent-eval research and demo surface |
| Observable | Agent Eval Notebook - public notebook surface for lightweight scorecard exploration |
| Streamlit | Agent Eval Scorecard - live app for scoring agent behavior with compact operational criteria |
| Replit | agent-eval-replit-demo - public Replit project with a hosted demo surface |
| Cloudflare Pages | Agent Evaluation Field Notes - static field-notes surface for scorecards, replay, and RAG guardrails |
| Firebase Hosting | Agent Evaluation Field Notes - Firebase-hosted mirror of the field-notes surface |
| Codeberg Pages | MukundaKatta.codeberg.page - public portfolio page routing across the non-GitHub footprint |
| GitHub | agent-eval-public-notes - public notes for scorecards, replay debugging, and RAG guardrail checks |
| GitLab | agent-eval-public-notes - GitLab mirror of the reusable agent-eval notes |
| GitHub | agent-eval-platform-starters - starter artifacts for publishing field notes across cloud, notebook, data, and docs platforms |
| GitLab | agent-eval-platform-starters - GitLab mirror of the platform-starter artifacts |
| Gitea | agent-eval-platform-starters - Gitea mirror of the platform-starter artifacts |
| StackBlitz | agent-eval-platform-starters workspace - browser-editable workspace URL for the starter artifacts |
| Gitpod | agent-eval-platform-starters workspace - cloud workspace launch URL for the starter artifacts |
| Read the Docs | Agent Evaluation Field Notes - hosted documentation for the field-notes project |
| Val Town | agent-scorecard-val - public TypeScript scorecard function for agent evaluation notes |
| Google Colab | Agent Evaluation Field Notes Scorecard - public notebook for scenario scoring and scorecard walkthroughs |
| CodeSandbox | Agent Evaluation Field Notes - public sandbox preview for the scorecard app |
| GitHub Gist | Operational scorecard template - standalone template for tool-using agent reviews |
| GitHub Gist | Trajectory replay debugging checklist - replay checklist for agent workflow regressions |
| GitHub Gist | RAG guardrail smoke tests - prompt-injection and vector-poisoning smoke tests |
| GitLab Snippet | Operational scorecard template - public snippet mirror for scorecard evaluation |
| GitLab Snippet | Trajectory replay debugging checklist - public snippet mirror for replay debugging |
| GitLab Snippet | RAG guardrail smoke tests - public snippet mirror for RAG guardrails |
| Kaggle | Premium Agent Repo Landscape - public dataset mapping premium agent repos by surface, stack, and focus |
| Kaggle | Agent Eval Scenarios - public eval dataset for lightweight agent benchmarking |
| Kaggle | building-a-lightweight-agent-eval-benchmark - clean public notebook replacement with a successful run and resilient dataset loading |
| Codeberg | premium-agent-landscape - public showcase repo for agent portfolio mapping and presentation |
| Codeberg | agent-eval-lab - public repo for evaluation artifacts and benchmark framing |
| Codeberg | apache-contribution-atlas - public tracker for Apache-facing contribution work |
| Codeberg | Documentation PR #784 - clarified HTTPS auth with 2FA and token-based Git usage |
| GitHub | agent-eval-lab-static - source repo for the Netlify research portal |
| Apache | fluss PR #3243 - added a blog contribution guide for the Fluss website community docs |
| Apache | fluss PR #3244 - added an FIP contribution guide for the Fluss contributor workflow |
| Apache | pulsar-site PR #1139 - fixed failover standby mapping in the 3.0.x docs |
|
Fit it. Token-aware message truncation with three strategies (drop-oldest, drop-middle, priority). Pluggable tokenizers. Per-model estimators. |
Sandbox it. Network-egress firewall: a declarative allowlist of domains agent tools can fetch. Throws on violation, with a clear error. |
Test it. Snapshot tests for tool-call traces. Catch silent regressions in LLM tool use the way you catch UI regressions today. |
Vet it. Validate tool args before execution. Wrap any tool function; on bad args, throw a typed error with an LLM-friendly retry hint. |
Validate it. Structured-output enforcer. Validate the model's response, retry with the validation error as feedback, return typed data or throw after N attempts. BYO LLM and validator. |
npm i @mukundakatta/agentfit @mukundakatta/agentguard @mukundakatta/agentsnap @mukundakatta/agentvet @mukundakatta/agentcastEach one also ships as an MCP server so Claude Desktop, Cursor, Cline, Windsurf, and Zed can call them directly mid-conversation:
npx -y @mukundakatta/agentfit-mcp # fit a chat history into a budget
npx -y @mukundakatta/agentguard-mcp # check URLs against an egress policy
npx -y @mukundakatta/agentsnap-mcp # diff tool-call traces
npx -y @mukundakatta/agentvet-mcp # validate tool args + generate retry hints
npx -y @mukundakatta/agentcast-mcp # extract / validate JSON from LLM textSibling libraries that share a design philosophy: small, focused, zero-dep, BYO-LLM. Each one solves a single concrete reliability problem so you can pick the ones you need without dragging in a framework. Previous drop, streamparse (streaming JSON parser, npm + Homebrew + MCP Registry), is still in active use.
I contribute practical fixes to AI SDKs, MCP tooling, eval frameworks, agent infrastructure, structured outputs, and developer experience.
My lane is finding the sharp edges that slow builders down: unclear contracts, brittle tool calls, docs that almost answer the question, eval gaps where regressions hide, and AI tooling that needs better failure signals. I like small, reviewable patches with clear intent, and compact packages that turn repeated manual checks into reusable workflows.
Recent contribution areas (merged upstream):
- Microsoft - security and architecture docs for internal AI-engineering toolchains (
hve-core,physical-ai-toolchain) - Pydantic -
pydantic-aiintegration with the Vercel AI SDK - Hugging Face ecosystem -
safetensorsPython bindings,sentence-transformerstrainer migration docs - Meilisearch -
heedmulti-target docs.rs infrastructure - Vercel -
next.jsdocumentation - Apache Software Foundation - doc / comment fixes across
iceberg,pulsar,skywalking,ozone,iotdb
I keep a public log of selected OSS work in oss-contributions.
Across ~425 unique external repos (excluding my own). Source: GitHub search API, author:MukundaKatta is:pr -user:MukundaKatta. Last refreshed 2026-05-14.
|
MERGED 140 across 106 external repos |
OPEN 320 awaiting review / response |
CLOSED 325 not merged |
TOTAL EXTERNAL 785 PRs authored upstream |
Top external repos by merged PRs
| Rank | Repository | Merged |
|---|---|---|
| 1 | microsoft/agent-governance-toolkit | 19 |
| 2 | microsoft/physical-ai-toolchain | 5 |
| 3 | microsoft/hve-core | 4 |
| 3 | PrefectHQ/fastmcp | 4 |
| 5 | Cacti/plugin_mactrack | 3 |
| 6 | meilisearch/heed | 2 |
| 6 | apache/pulsar-site | 2 |
| 6 | momenbasel/PureMac | 2 |
| 6 | DiogoRibeiro7/pyseas | 2 |
Top external repos by total PR volume (merged + open + closed)
| Rank | Repository | PRs |
|---|---|---|
| 1 | microsoft/agent-governance-toolkit | 21 |
| 2 | langchain-ai/langgraph | 17 |
| 3 | googleapis/python-genai | 15 |
| 4 | openai/openai-node | 12 |
| 5 | anthropics/claude-agent-sdk-python | 11 |
| 6 | ChromeDevTools/chrome-devtools-mcp | 10 |
| 6 | modelcontextprotocol/python-sdk | 10 |
| 6 | PrefectHQ/fastmcp | 10 |
| 9 | google/magika | 9 |
| 9 | anthropics/anthropic-sdk-python | 9 |
Distribution pattern. Each flagship ships as a complete unit, not a single npm package:
library → Python port → CLI binary → GitHub Action → Homebrew formula → MCP server
npm PyPI Marketplace brew tap npm
So the same problem (mcpcheck, skillint, streamparse) is solvable from any environment a developer or AI assistant happens to be in: a TypeScript app, a Python script, a CI workflow, a terminal, or directly inside Claude / Cursor / Cline / Windsurf / Zed.
- openai/openai-node #1831 — improved fallback handling for non-standard JSON error bodies
- openai/tiktoken #529 — added PyInstaller hooks for dynamic encoding plugins
- googleapis/python-genai #2298 — clarified response_schema vs response_json_schema
- microsoft/playwright-mcp #1562 — clarified extension connection and tab-selection flow
- anthropics/anthropic-sdk-python #1412 — fixed async memory tool example docs
- stanford-crfm/helm #4210 — fixed later-page deep links for run instances
Last refreshed 2026-05-13 from npm, PyPI, and the GitHub API.
Latest releases
2026-05-11·@mukundakatta/lorem-mcpv0.1.0· npm2026-05-11·@mukundakatta/color-mcpv0.1.0· npm2026-05-11·@mukundakatta/mime-mcpv0.1.0· npm
Recently merged PRs
2026-05-12· freeCodeCamp/freeCodeCamp #67330 — fix(curriculum): clarify DOM element node wording2026-05-04· PrefectHQ/fastmcp #4069 — Fix #4056: keep blank query values, add token bucket regression test2026-05-04· PrefectHQ/fastmcp #4076 — fix(openapi): keep blank values in parse_qs (refs #4056)2026-05-04· PrefectHQ/fastmcp #4070 — docs(integrations): add Pydantic AI FastMCP toolset guide2026-05-07· elastic/beats #50281 — docs: fix 'ElasticSearch' casing and 'a SSL' -> 'an SSL' across reference docs
npm (scope @mukundakatta):
Flagship packages:
| Package | Why it matters | Install |
|---|---|---|
| @mukundakatta/streamparse partial JSON for LLM streams |
Streaming JSON parser that yields partial valid trees as tokens arrive. Render LLM tool calls mid-stream, recover dropped responses, parse messy ` ```json ` blocks. Zero deps, 64 tests. Also published as an MCP server in the official MCP Registry. | npm i @mukundakatta/streamparse |
| @mukundakatta/streamparse-mcp MCP: parse partial JSON |
MCP server that lets Claude / Cursor / Cline / Windsurf / Zed parse partial, truncated, or messy JSON on demand. Three tools: parse_partial_json, extract_json_from_text, validate_json. |
npx -y @mukundakatta/streamparse-mcp |
| @mukundakatta/mcpcheck MCP config quality gate |
Lint MCP config files for Claude Desktop, Cursor, Cline, Windsurf, and Zed. CLI, GitHub Action, and SARIF for code scanning. | npm i -g @mukundakatta/mcpcheck |
| @mukundakatta/designlint frontend quality checks |
HTML/CSS accessibility and design linter for contrast, touch targets, headings, form labels, and leaked secrets. | npm i -g @mukundakatta/designlint |
| @mukundakatta/skillint AI skill validation |
Lint Claude Code SKILL.md files for frontmatter, required fields, descriptions, and hardcoded secrets. |
npm i -g @mukundakatta/skillint |
| @mukundakatta/ai-eval-forge eval harness |
Zero-dependency eval harness for comparing model, prompt, and agent behavior. CLI plus programmatic API; also on PyPI. | npm i @mukundakatta/ai-eval-forge |
| @mukundakatta/codex-skill-kit Codex skill tooling |
Scaffold and validate Codex skills from the command line. Published for npm and PyPI workflows. | npm i -g @mukundakatta/codex-skill-kit |
| @mukundakatta/kavach AI-app threat signals |
Small, inspectable threat-scoring library for AI-app security monitoring: signals to weighted score to tier and playbook. | npm i @mukundakatta/kavach |
More npm packages (90) - grouped by area
MCP servers (20) - callable directly from Claude Desktop, Cursor, Cline, Windsurf, Zed via stdio. The 14 in mcp-stack plus the 5 in @mukundakatta/agent* and streamparse-mcp. All 20 listed in the official MCP Registry:
| Package | What it does |
|---|---|
@mukundakatta/streamparse-mcp |
Parse partial / truncated / messy JSON for LLM tool calls. Listed in the official MCP Registry. |
@mukundakatta/agentfit-mcp |
Token-aware message truncation: count tokens, fit a chat history into a budget. |
@mukundakatta/agentguard-mcp |
Check URLs against a network-egress allowlist before any tool fetch. |
@mukundakatta/agentsnap-mcp |
Diff and validate tool-call trace snapshots. |
@mukundakatta/agentvet-mcp |
Validate tool-call args against a shape spec; produce LLM-friendly retry hints. |
@mukundakatta/agentcast-mcp |
Extract JSON from messy LLM text and validate it against a shape. |
@mukundakatta/promptbudget-mcp |
Truncate text to a token budget with 4 strategies (head, tail, head+tail, smart-cut). Sibling to the promptbudget Rust crate. Listed in the official MCP Registry. |
@mukundakatta/citecite-mcp |
Inject [1] [2] citation markers into RAG outputs, parse them back, or strip them. Sibling to the citecite Rust crate. Listed in the official MCP Registry. |
@mukundakatta/ragmetric-mcp |
Compute RAG retrieval IR metrics on demand: recall@k, hit@k, MRR, NDCG@k, batch eval. Sibling to the ragmetric Rust crate. Listed in the official MCP Registry. |
@mukundakatta/ragdrift-mcp |
Diagnose RAG drift alerts: interpret scores, recommend thresholds, explain the 5 drift dimensions. Sibling to ragdrift / ragdrift-py. Listed in the official MCP Registry. |
@mukundakatta/csv-tools-mcp |
RFC 4180 CSV parsing + generation. Handles quoted fields, embedded commas, BOMs, CRLF. Tools: parse_csv, to_csv, pluck_columns. Listed in the official MCP Registry. |
@mukundakatta/regex-test-mcp |
Trustworthy JS regex testing with real match offsets, named groups, safe against zero-width loops. Tools: test_regex, find_all, replace. Listed in the official MCP Registry. |
@mukundakatta/jmespath-mcp |
Run JMESPath queries against deep JSON. Pure JS, no jq binary needed. Tool: json_query. Listed in the official MCP Registry. |
@mukundakatta/diff-mcp |
Character-precise unified diffs + patch application + parsing. For code-review and code-edit agents. Tools: unified_diff, apply_patch, parse_patch. Listed in the official MCP Registry. |
@mukundakatta/sqlfmt-mcp |
Deterministic SQL formatting across 19 dialects (postgres, mysql, snowflake, bigquery, etc.). Tools: format_sql, list_dialects. Listed in the official MCP Registry. |
@mukundakatta/shellquote-mcp |
Safe shell argument escaping for bash, cmd.exe, PowerShell. Stops LLM-generated shell commands from breaking on quotes, $vars, backslashes. Tools: quote_bash, quote_bash_argv, quote_cmd, quote_powershell. Listed in the official MCP Registry. |
@mukundakatta/json5-mcp |
Parse JSON-with-comments / trailing-commas / unquoted keys, and round-trip to strict JSON. Tools: parse_json5, to_json5, to_strict_json. Listed in the official MCP Registry. |
@mukundakatta/toml-yaml-json-mcp |
Parse, format, and convert configs across TOML / YAML / JSON. LLMs especially mishandle TOML. Tools: parse, format, convert. Listed in the official MCP Registry. |
@mukundakatta/timezone-mcp |
IANA timezone math with real DST rules via Intl.DateTimeFormat. LLMs hallucinate offsets. Tools: convert_tz, now_in, tz_offset. Listed in the official MCP Registry. |
@mukundakatta/html-to-markdown-mcp |
HTML → Markdown via Turndown. For web-scraping and read-the-page agents. Tools: html_to_md, extract_text. Listed in the official MCP Registry. |
Structured outputs & parsing (1)
| Package | What it does |
|---|---|
@mukundakatta/streamparse |
Streaming JSON parser that yields partial valid trees as tokens arrive. |
Agent infrastructure (11)
| Package | What it does |
|---|---|
@mukundakatta/agentfit |
Token-aware message truncation; fit chat history into a context budget. |
@mukundakatta/agentguard |
Network-egress firewall for agent tools: declarative domain allowlist. |
@mukundakatta/agentsnap |
Snapshot tests for tool-call traces, like Jest snapshots for LLM tool use. |
@mukundakatta/agentvet |
Validate tool args before execution, with LLM-friendly retry hints. |
@mukundakatta/agentcast |
Structured-output enforcer: validate, retry with feedback, BYO-LLM/validator. |
@mukundakatta/agent-loop-breaker |
Detect repeated agent steps and stop runaway loops. |
@mukundakatta/agent-regression-lens |
Detect regressions between baseline and current AI agent runs. |
@mukundakatta/agent-trajectory-replay |
Replay and diff AI agent event trajectories for debugging regressions. |
@mukundakatta/tool-call-contracts |
Validate LLM tool-call payloads with small JSON-like contracts. |
@mukundakatta/tool-permission-gate |
Policy-check agent tool calls before execution. |
@mukundakatta/tool-result-taint |
Track untrusted tool output before it enters prompts or actions. |
RAG & retrieval (6)
| Package | What it does |
|---|---|
@mukundakatta/rag-quality-kit |
Heuristic quality metrics for RAG retrieval and grounded answers. |
@mukundakatta/rag-staleness-auditor |
Find stale RAG chunks by age, version, and freshness requirements. |
@mukundakatta/retrieval-acl-filter |
Enforce document ACLs after retrieval and before prompting. |
@mukundakatta/vector-poison-score |
Score retrieved documents for vector/RAG poisoning signals. |
@mukundakatta/embedding-dedupe |
Deduplicate near-identical embedding records by cosine similarity. |
@mukundakatta/context-drift-detector |
Detect topic drift between user intent, retrieved context, and AI answers. |
Prompt & output safety (5)
| Package | What it does |
|---|---|
@mukundakatta/pii-sentry |
Detect and redact PII and secret-like values before AI processing. |
@mukundakatta/prompt-injection-shield |
Prompt-injection risk scanner for untrusted AI context. |
@mukundakatta/llm-output-sanitizer |
Sanitize LLM outputs before rendering, SQL, shell, or markdown sinks. |
@mukundakatta/system-prompt-leak-scan |
Detect system prompt leakage in model outputs. |
@mukundakatta/jailbreak-corpus-mini |
Small local jailbreak + prompt-injection fixture set for tests. |
Context & prompt engineering (4)
| Package | What it does |
|---|---|
@mukundakatta/context-forge |
Context engineering toolkit for ranking, packing, and risk-scanning RAG context. |
@mukundakatta/context-window-packer |
Pack context chunks into a budget by relevance and priority. |
@mukundakatta/prompt-token-trim |
Trim prompt messages to fit a token budget while preserving priority. |
@mukundakatta/prompt-version-diff |
Diff prompt templates and flag risky instruction changes. |
Evals & tracing (3)
| Package | What it does |
|---|---|
@mukundakatta/eval-dataset-smith |
Generate balanced eval cases from bugs, docs, examples, and policies. |
@mukundakatta/eval-flake-detector |
Detect flaky LLM eval cases across repeated runs. |
@mukundakatta/llm-trace-sampler |
Sample LLM traces by risk, errors, latency, and deterministic ids. |
Cost, routing & caching (4)
| Package | What it does |
|---|---|
@mukundakatta/llm-cost-guard |
Estimate AI request cost and enforce per-request or session budgets. |
@mukundakatta/model-fallback-planner |
Plan model fallback chains from capability, cost, and health data. |
@mukundakatta/model-router-policy |
Policy-based model routing by capability, cost, latency, and privacy. |
@mukundakatta/semantic-cache-key |
Stable semantic cache keys for AI prompts, tools, models, and retrieval context. |
Supply chain, citations, consent (5)
| Package | What it does |
|---|---|
@mukundakatta/ai-supply-chain-manifest |
Build and validate lightweight AI model / data / tool manifests. |
@mukundakatta/citation-integrity-check |
Verify answer citations refer to supplied source ids. |
@mukundakatta/consent-redaction-log |
Record consent-aware redactions for privacy review trails. |
@mukundakatta/hallucination-risk-meter |
Estimate hallucination risk from answer, context, citations, and uncertainty language. |
@mukundakatta/llm-response-schema-lite |
Tiny schema validator for structured LLM responses. |
Install any of them with npm i @mukundakatta/<package>.
PyPI:
| Package | Purpose | Install |
|---|---|---|
| claude-skill-check |
Lint Claude Code SKILL.md files for YAML frontmatter, required fields, description quality, and secret patterns. |
pip install claude-skill-check |
| mcp-config-check |
Validate MCP configs across Claude Desktop, Cursor, Cline, Windsurf, and Zed; catches auth, transport, duplicate, and placeholder issues. | pip install mcp-config-check |
| claude-hooks-check |
Audit Claude Code hooks for malformed matchers, dangerous commands, invalid events, and hardcoded secrets. | pip install claude-hooks-check |
| claude-commands-check |
Validate Claude Code slash-command files for naming, frontmatter, model values, allowed-tools shape, and secret leakage. | pip install claude-commands-check |
| llm-usage-report |
Parse raw LLM API response logs and generate token and cost reports by provider, model, day, project, or user. | pip install llm-usage-report |
| codex-skill-kit |
Scaffold and validate Codex skills from Python environments; mirrors the npm CLI workflow. | pip install codex-skill-kit |
| ai-eval-forge |
Zero-dependency LLM and agent eval harness with exact, regex, token-F1, JSON, and citation-coverage checks. | pip install ai-eval-forge |
| agent-run-diff |
Compare baseline and current agent runs across success, errors, tools, output drift, steps, latency, and cost. | pip install agent-run-diff |
More PyPI packages (44) - Python ports of the @mukundakatta JS libraries
Streaming + agent reliability stack (6)
| Package | What it does |
|---|---|
partial-json-stream |
Streaming JSON parser that yields partial valid trees as tokens arrive. |
agentfit-py |
Token-aware message truncation; fit a chat history into a context budget. |
agentguard-firewall |
Network-egress firewall for agent tools. |
agentsnap-py |
Snapshot tests for tool-call traces. |
agentvet-py |
Validate tool args before execution; LLM-friendly retry hints. |
agentcast-py |
Structured-output enforcer; validate, retry with feedback. |
Prompt + output safety (3)
| Package | What it does |
|---|---|
pii-sentry-py |
Detect and redact PII and secret-like values before AI processing. |
prompt-injection-shield-py |
Prompt-injection risk scanner for untrusted AI context. |
llm-output-sanitizer-py |
Sanitize LLM outputs before HTML / SQL / shell / markdown sinks. |
RAG + retrieval (3)
| Package | What it does |
|---|---|
rag-quality-kit |
Heuristic quality metrics for RAG retrieval and grounded answers. |
vector-poison-score |
Score retrieved documents for vector / RAG poisoning signals. |
embedding-dedupe |
Deduplicate near-identical embedding records by cosine similarity. |
Cost, caching, evals (3)
| Package | What it does |
|---|---|
llm-cost-guard-py |
Estimate AI request cost and enforce per-request or session budgets. |
semantic-cache-key |
Stable semantic cache keys for AI prompts, tools, models, retrieval. |
eval-flake-detector |
Detect flaky LLM eval cases across repeated runs. |
Verification + grounding (3)
| Package | What it does |
|---|---|
citation-integrity-check |
Verify answer citations refer to supplied source ids. |
hallucination-risk-meter |
Estimate hallucination risk from answer + context + citations. |
system-prompt-leak-scan |
Detect system-prompt leakage in model outputs. |
Agent infrastructure + meta (6)
| Package | What it does |
|---|---|
mk-agentkit |
Meta-package re-exporting all 5 agent-stack ports under one import. |
agent-loop-breaker-py |
Detect repeated agent steps and stop runaway loops. |
agent-regression-lens-py |
Detect regressions between baseline and current agent runs. |
agent-trajectory-replay-py |
Replay and diff agent event trajectories. |
tool-call-contracts-py |
Validate LLM tool-call payloads with small JSON-like contracts. |
tool-permission-gate-py |
Policy-check agent tool calls before execution. |
Tools / safety / privacy (4)
| Package | What it does |
|---|---|
tool-result-taint-py |
Track untrusted tool output before it enters prompts. |
jailbreak-corpus-mini-py |
Local jailbreak + prompt-injection fixture set for tests. |
consent-redaction-log-py |
Record consent-aware redactions for privacy review trails. |
kavach-py |
Threat-scoring library for AI-app security monitoring. |
RAG (3)
| Package | What it does |
|---|---|
rag-staleness-auditor-py |
Find stale RAG chunks by age, version, and freshness requirements. |
retrieval-acl-filter-py |
Enforce document ACLs after retrieval and before prompting. |
context-drift-detector-py |
Detect topic drift between intent, context, and answer. |
Context engineering (5)
| Package | What it does |
|---|---|
context-forge-py |
Context engineering toolkit: ranking, packing, risk-scanning. |
context-window-packer-py |
Pack context chunks into a budget by relevance and priority. |
prompt-token-trim-py |
Trim prompt messages to fit a token budget while preserving priority. |
prompt-version-diff-py |
Diff prompt templates and flag risky instruction changes. |
llm-response-schema-lite-py |
Tiny schema validator for structured LLM responses. |
Evals + cost + routing (5)
| Package | What it does |
|---|---|
eval-dataset-smith-py |
Generate balanced eval cases from bugs, docs, examples, policies. |
llm-trace-sampler-py |
Sample LLM traces by risk, errors, latency, and deterministic ids. |
llm-cost-guard-py |
Estimate AI request cost and enforce per-request or session budgets. |
model-fallback-planner-py |
Plan model fallback chains from capability, cost, and health data. |
model-router-policy-py |
Policy-based model routing by capability, cost, latency, privacy. |
Niche linters (4)
| Package | What it does |
|---|---|
mcpcheck-py |
Lint MCP config files for Claude Desktop, Cursor, Cline, Windsurf, Zed. |
skillint-py |
Lint Claude Code SKILL.md files. |
designlint-py |
HTML/CSS accessibility and design linter. |
ai-supply-chain-manifest-py |
Build and validate lightweight AI model / data / tool manifests. |
GitHub Marketplace (7 Actions):
Composite GitHub Actions, discoverable on the GitHub Marketplace:
Linters:
Agent-stack CI gates:
agentvet-action- fail PRs on bad LLM tool definitionsagentsnap-action- fail PRs on tool-call trace driftmcp-stack-validate-action- one CI gate that runs all 5 agent-stack tools
Homebrew tap - mukundakatta/tools:
brew tap mukundakatta/tools
brew install claude-skill-check mcp-config-check claude-hooks-check claude-commands-checkEach ships a CLI, a programmatic API, and (for the linters) a composite GitHub Action you can drop into any workflow in 3 lines.
🦀 crates.io (Rust) - MukundaKatta - small focused crates for the LLM / RAG / agent niche:
| Crate | Purpose | Install |
|---|---|---|
| ragdrift RAG drift detection |
Five-dimensional drift detection for production RAG: data, embedding, response, confidence, query mix. MMD + sliced Wasserstein + KS + PSI in pure Rust. Sibling Python wheel ships as ragdrift-py. |
cargo add ragdrift |
| ragdrift-core RAG drift core |
Pure-Rust core of ragdrift: KS, PSI, MMD (RBF kernel), 1D and sliced Wasserstein, k-means. No BLAS dependency. Used as the backbone of the Python ragdrift-py wheel via PyO3. |
cargo add ragdrift-core |
| embedrank vector top-k retrieval |
Batched cosine / dot / L2 distance for f32 embeddings + a heap-based top-k selector. No BLAS, no allocator surprises. Designed for the hot path of small-to-medium RAG retrieval. | cargo add embedrank |
| promptbudget token-budget truncation |
Token-budget-aware text truncation with multiple strategies (head, tail, head+tail, smart-cut with marker). Bring-your-own tokenizer; no hard tiktoken dependency. |
cargo add promptbudget |
| stopstream streaming stop-sequence detector |
Streaming-safe stop-sequence detector for LLM token streams. Holds back exactly the suffix-prefix overlap so partial matches at chunk boundaries never leak downstream. UTF-8 boundary safe. | cargo add stopstream |
| citecite RAG citation markers |
Citation-marker [1] [2] injector + parser for RAG outputs. Round-trippable: inject markers tied to source ids, parse them back when post-processing, or strip them entirely. |
cargo add citecite |
| ragmetric RAG retrieval IR metrics |
IR metrics for RAG retrieval evaluation: Recall@k, Hit@k, MRR, NDCG@k. Pure data ops, no model dependencies. Sibling-in-spirit to ragdrift. |
cargo add ragmetric |
The five rust-llm-stack crates are also available together as one workspace: MukundaKatta/rust-llm-stack. Each crate is independently versioned and published.
More crates (78) - grouped by area
Agent reliability stack (10) - sibling Rust ports of the @mukundakatta/agent* npm family:
| Crate | Purpose |
|---|---|
agentfit |
Token-aware message truncation; pluggable tokenizers (tiktoken feature for accurate BPE). |
agentguard |
Network-egress allowlist for AI agent tools; optional reqwest-middleware integration. |
agentsnap |
Snapshot tests for agent traces; Jest-style record-and-diff. |
agentvet |
Validate LLM-generated tool args against a JSON Schema; LLM-friendly retry hints. |
agentcast |
Structured-output enforcer: repair → validate → optional retry-with-LLM. |
agenttrace |
Run-level cost + latency aggregation; p50/p95 + per-model breakdown. |
agentprompt |
Jinja2-syntax LLM prompt templates with role-aware Messages builder. |
agentidemp |
Idempotency keys for agent retries; deterministic content-derived (sha256-hex / UUIDv5). |
agenttap |
Wire-level prompt introspection; credentials redacted by default. |
llmfleet |
Fleet-level batch dispatcher; pool requests across tasks for 50% off via Batch APIs. |
Cross-provider primitives (2) - small focused primitives that don't depend on any official SDK:
| Crate | Purpose |
|---|---|
claude-stream |
Incremental SSE event-stream parser → typed Event enum. |
llm-json-repair |
Three-pass JSON repair (fences, balanced extraction, trailing commas) for messy LLM output. |
Cost & budget (6) - per-provider cost calculators plus the aggregator and the concurrency cap:
| Crate | Purpose |
|---|---|
claude-cost |
Cache-aware cost calculator for Anthropic API + Bedrock model IDs. |
openai-cost |
Cache-aware OpenAI cost from a usage block; supports prompt_tokens_details.cached_tokens. |
gemini-cost |
Cache-aware Google Gemini cost from a usage block; Gemini 2.5 family. |
bedrock-cost |
Cross-vendor Bedrock pricing (Anthropic, Llama, Mistral, Cohere, Titan, AI21); inference-profile aware across regions. |
cost-meter |
Provider-agnostic aggregator: total LLM cost across providers, models, and time windows. |
token-budget-pool |
Shared token + dollar cap across concurrent LLM tasks; thread-safe; BudgetExceeded on push past cap. |
Observability & tracing (2)
| Crate | Purpose |
|---|---|
cachebench |
Prompt-cache observability: per-call hit ratio, cost saved, regression alerts, miss-aware retry. |
otel-genai-bridge |
Translate LLM telemetry attributes between OpenInference and OTel GenAI semantic conventions. |
Agent runtime primitives (11) - small pieces every long-running agent loop needs:
| Crate | Purpose |
|---|---|
agent-event-emit |
Structured event emitter for agent runs; append-only, JSON-line serializable. |
step-id |
Stable, deterministic IDs for agent steps (hash of run_id + step_index + kind). |
trace-diff |
Diff two agent traces semantically; align by event type + key, ignore timestamps. |
trace-redact |
Redact API keys, tokens, emails, phone numbers from agent traces before persist. |
tool-arg-coerce |
Fix common type slips in LLM-generated tool arguments (string→int/float/bool). |
tool-loop-break |
Detect repeated tool invocations and break runaway loops. |
tool-output-truncate |
Truncate tool output (file reads, command runs) before adding to context. |
tool-retry-policy |
Declarative retry policy for LLM tool calls; per-tool max-attempts and backoff. |
llm-circuit-breaker |
Tiny circuit breaker for LLM API calls; Closed/Open/HalfOpen. |
llm-retry |
Runtime-agnostic full-jitter exponential backoff with built-in retryable-code lists per provider. |
llm-message-hash |
Stable canonical hash of LLM request/message structures; recursive key-sort. |
LLM streaming + output cleanup (14) - the boring data-massaging that ships agents:
| Crate | Purpose |
|---|---|
chunk-flush |
Flush-on-newline buffer for streaming LLM output. |
lineify |
Turn a token-by-token stream into stable line events. |
sse-frame |
Streaming parser for Server-Sent Events frames used by LLM APIs (OpenAI, Anthropic, Gemini). |
stream-chunkrec |
Recombine LLM streaming token deltas into stable text; buffers partial words. |
json-streamparse-rs |
Streaming JSON balance detector; feed bytes incrementally. |
prompt-fence-strip |
Strip ```code fences```, leading prose, and trailing chatter from LLM output. |
markdown-strip |
Strip Markdown formatting (headers, bold, italic, links, code, blockquotes). |
html-entity-fix |
Decode HTML entities (& < ' etc) that LLMs sometimes emit by accident. |
bom-strip |
Strip UTF-8/16/32 BOM bytes and stray U+FEFF code points. |
emoji-sanitize |
Normalize or strip emoji-related Unicode (presentation selectors, variation selectors). |
toml-repair |
Repair messy TOML emitted by LLMs (fences, line endings). |
yaml-repair |
Repair messy YAML emitted by LLMs (fences, tabs→spaces, dedent). |
schema-coerce |
Coerce LLM JSON values to a simple field-schema (string→int, bool, float). |
json-pluck |
Pluck a single value out of a serde_json::Value by dotted path or JSON pointer. |
Cross-provider primitives (2) - small focused primitives that don't depend on any official SDK:
| Crate | Purpose |
|---|---|
claude-stream |
Incremental SSE event-stream parser → typed Event enum. |
llm-json-repair |
Three-pass JSON repair (fences, balanced extraction, trailing commas) for messy LLM output. |
Cost & budget (6) - per-provider cost calculators plus the aggregator and the concurrency cap:
| Crate | Purpose |
|---|---|
claude-cost |
Cache-aware cost calculator for Anthropic API + Bedrock model IDs. |
openai-cost |
Cache-aware OpenAI cost from a usage block; supports prompt_tokens_details.cached_tokens. |
gemini-cost |
Cache-aware Google Gemini cost from a usage block; Gemini 2.5 family. |
bedrock-cost |
Cross-vendor Bedrock pricing (Anthropic, Llama, Mistral, Cohere, Titan, AI21); inference-profile aware across regions. |
cost-meter |
Provider-agnostic aggregator: total LLM cost across providers, models, and time windows. |
token-budget-pool |
Shared token + dollar cap across concurrent LLM tasks; thread-safe; BudgetExceeded on push past cap. |
Observability & tracing (2)
| Crate | Purpose |
|---|---|
cachebench |
Prompt-cache observability: per-call hit ratio, cost saved, regression alerts, miss-aware retry. |
otel-genai-bridge |
Translate LLM telemetry attributes between OpenInference and OTel GenAI semantic conventions. |
Eval & introspection (3)
| Crate | Purpose |
|---|---|
eval-flake-rs |
Detect flaky LLM eval cases by tracking pass/fail across repeated runs. |
gold-cmp |
Pairwise comparison runner for gold-set LLM evals: A vs B winner counting. |
latency-buckets |
Streaming histogram + percentile estimator for LLM call latencies. |
Prompt + input safety (7) - input-side hardening before content reaches the model:
| Crate | Purpose |
|---|---|
prompt-inj-rs |
Prompt-injection risk scanner; Rust port of @mukundakatta/prompt-injection-shield. |
regex-pii-rs |
Regex-only PII detector for emails, phones, SSNs, credit cards. |
secret-mask |
Mask known secret patterns (API keys, JWTs, AWS access keys, GitHub tokens). |
output-sanitize-rs |
Strip dangerous HTML/SQL/shell snippets from LLM output before render or query. |
homoglyph-detect |
Detect Cyrillic/Greek lookalike chars masquerading as ASCII (prompt-injection defense). |
zero-width-strip |
Strip zero-width and bidi-control Unicode characters from text. |
rtl-flip-detect |
Detect right-to-left override (U+202E) and other bidi-control characters. |
RAG infrastructure (7) - retrieval-side primitives below the flagship ragdrift / embedrank line:
| Crate | Purpose |
|---|---|
bm25-rerank |
BM25 reranker for RAG; in-memory term-frequency reranking against a small candidate set. |
mmr-rerank |
Maximal Marginal Relevance reranker; diversify a set of retrieved docs. |
rerank-blend |
Blend N reranker score streams (dense, BM25, cross-encoder) with configurable weights. |
code-chunk |
Split source code into RAG-friendly chunks that respect function and class boundaries. |
markdown-chunk |
Split Markdown into RAG-friendly chunks that respect heading hierarchy. |
cosine-fast |
Hot-loop cosine similarity for f32 slices; auto-vectorized scalar core. |
embed-key |
Deterministic cache key for an embedding request; hash text + mix in provider/model. |
Caching + hashing primitives (3)
| Crate | Purpose |
|---|---|
content-cas |
Content-addressed cache primitive: store bytes under their SHA-256 hex. |
prompt-hash |
Deterministic cache key for an LLM prompt; normalize whitespace, hash messages. |
promptver |
Hash and version prompt templates so eval results, cache keys, and audit logs stay aligned. |
Pure-Rust utility cores (13) - small, allocation-disciplined building blocks:
| Crate | Purpose |
|---|---|
snipsplit-core |
Token-aware text chunker for RAG ingestion. |
lshdedup-core |
MinHash + LSH near-duplicate detection. |
vecnorm-core |
Bulk vector ops on f32 matrices. |
toklab-core |
Bulk tokenizer + counter for OpenAI BPE encodings. |
annflat-core |
Small in-memory flat-file ANN over f32 vectors. |
maskprompt-core |
PII redaction for LLM prompts. |
embedcache-core |
Content-addressed embedding cache. |
textsanity-core |
Unicode / whitespace / encoding cleanup. |
secretsniff-core |
Source-code secret scanner. |
llm-error-class |
Classify LLM provider error responses (rate-limit, auth, server, context-window, content-policy). |
char-token-est |
Tokenless byte/char-based token-count estimator; per-model-family calibrated. |
tiktoken-stream |
Streaming token counter for partial LLM responses; accumulates across chunks without re-tokenizing. |
lru-tokens |
LRU cache weighted by token count, not entry count; bound a prompt cache by token budget. |
Total: 85 published crates under MukundaKatta on crates.io.
🤗 HuggingFace - mukunda1729 - 14 Spaces · 13 Datasets:
🚀 Live Gradio playgrounds (6):
| Space | What you can try |
|---|---|
agent-stack-demo |
All 5 libs (fit, guard, snap, vet, cast) in one app. |
token-counter |
Count tokens for any text across Claude / GPT / Llama tokenizers. |
json-extractor |
Pull clean JSON out of messy LLM output (fenced, inline, unfenced). |
pii-redactor |
Find emails, phones, secrets, and IDs - mask, hash, or highlight. |
prompt-injection-detector |
Heuristic scanner for the most common injection families. |
mcp-config-validator |
Sanity-check Claude Desktop / Cursor / Cline / Windsurf / Zed configs. |
📖 Static reference & explainer pages (8):
| Space | What it covers |
|---|---|
agent-stack-tour |
Guided tour of all 5 libraries with install commands and live links. |
why-this-stack |
The thinking behind the stack - what's broken, why these 5 libs. |
install-cheatsheet |
All install commands across pip, npm, and MCP. |
mcp-quickstart |
Add the 5 MCP servers to Claude Desktop / Cursor / Cline / Windsurf / Zed. |
fit-strategies-explained |
Visual explainer: drop-oldest vs drop-middle vs priority. |
trace-format-reference |
Field-by-field reference for the agentsnap trace JSON schema. |
prompt-injection-taxonomy |
10-category taxonomy with examples + the cheap defense for each. |
dataset-cards-index |
One-page index of the 16 datasets below. |
📊 Datasets (16) - all MIT, all datasets.load_dataset("mukunda1729/<name>") ready:
| Dataset | Rows | Purpose |
|---|---|---|
jailbreak-corpus-mini |
15 | Curated jailbreak fixtures across 8 categories. |
prompt-injection-patterns-extended |
30 | Prompt-injection patterns across 10 categories. |
pii-detection-fixtures |
25 | PII / secret strings labeled with span offsets. |
tool-arg-validation-cases |
20 | (Tool, schema, args) tuples - valid + invalid. |
mcp-tool-test-fixtures |
22 | MCP tool-call args across 8 categories. |
llm-output-extraction-cases |
20 | Messy LLM outputs with expected JSON. |
hallucination-risk-cases |
20 | Prompt → response pairs rated for hallucination risk. |
rag-quality-benchmarks-mini |
15 | RAG eval queries with ground-truth answers. |
agent-trace-samples |
10 | agentsnap-format tool-call traces (good + regressed pairs). |
agent-budget-violations |
15 | Agent runs with budget caps + actual usage + root cause. |
token-counting-edge-cases |
20 | Strings with token counts across 3 tokenizer families. |
model-pricing-table |
20 | LLM pricing - input/output cost per 1k tokens, context window. |
mcp-config-examples |
15 | MCP client configs across Claude Desktop, Cursor, Cline, Windsurf, Zed. |
Karna - AI Agent PlatformSelf-hosted AI assistant with 7 messaging channels (Telegram, Slack, Discord, WhatsApp, SMS, iMessage, Web), extensible plugin SDK, semantic memory, and voice. TypeScript monorepo with Next.js dashboard and React Native mobile app. Stack · TypeScript • Node.js • Next.js • Supabase • WebSocket • pgvector |
Chetana - AI Consciousness Research PlatformResearch-driven platform exploring machine consciousness through 14 indicators grounded in 6 scientific theories. Built to turn abstract AI-consciousness questions into structured experiments, scoring, and analysis. Stack · AI Research • Evaluation • Experimentation • Python |
AgentRAG - Modular RAG PipelineProvider-agnostic RAG framework with pluggable vector stores, chunking strategies, and retrieval methods. Designed for agentic workflows with clean API boundaries. Stack · RAG • Vector Search • Embeddings • TypeScript |
Astra Agent - AI Agent RuntimeStandalone AI agent runtime with tool execution, context management, and multi-model routing. Foundation for building autonomous AI assistants with structured tool use. Stack · TypeScript • LLM Orchestration • Tool Use • Agents |
More Projects
| Project | Description |
|---|---|
| Sadhak | AI-powered job search command center - automated evaluation, resume tailoring, application tracking |
| Chetana | AI consciousness research platform - 14 indicators from 6 scientific theories |
| Prithvi | Container security scanner - vulnerability detection, compliance checks, Docker audits |
| Amogha Cafe | Full-stack Firebase restaurant platform - real-time ordering, QR dine-in. Live |
| RNHT | Temple community platform - events, donations, priest scheduling |
| Patchly | AI code review bot - flags bugs, suggests fixes, explains why, like a senior engineer |
| Evalharness | Prompt, agent, and RAG test harness - red teaming, regression testing, CI/CD for AI |
| AgentMem | Pluggable memory management for AI agents |
| LLM Bench CLI | CLI for benchmarking local LLMs - speed, throughput, quality |
| TokenWise | Token usage optimization across providers |
| Production AI / ML Impact | |||
|
COST EFFICIENCY 78% infrastructure cost reduction SageMaker → Bedrock migration |
LATENCY 600x retrieval latency improvement ML prediction system |
RAG SCALE 30K+ knowledge base entries 9-stage agentic RAG pipeline |
QUALITY 370+ unit tests & evaluations production ML systems |
| Open Source Footprint | |||
|
UPSTREAM 140 merged PRs across 106 external public repos |
PACKAGES 200+ 95 npm (47 *-mcp servers + 48 libs) + 53 PyPI + 20 in the official MCP Registry + 31 crates.io Rust crates + 7 GitHub Marketplace Actions + 17 HF Spaces + 16 HF Datasets + 1 Kaggle Dataset + 1 Homebrew tap + 1 GHCR image + 1 OSF project + 4 Codeberg mirrors + 4 GitLab mirrors |
ORIGINAL WORK 322 original public repos maintained on GitHub |
ECOSYSTEMS 6+ major org ecosystems OpenAI, Anthropic, Google, Microsoft, Stanford, Princeton |
ML Systems Fault prediction, embedding pipelines, model evaluation, cost-optimized inference
Agentic AI RAG pipelines, LangGraph workflows, query routing, hallucination detection
Cloud Infrastructure AWS (Bedrock, SageMaker, ECS, OpenSearch), GCP, Azure, Kubernetes, Terraform
Full-Stack React/TypeScript + Java/Python backend APIs, CI/CD, zero-downtime deployments
| Role | Company | Era | Primary arena |
|---|---|---|---|
| AI/ML Engineer | Southwest Airlines | Aug 2025 - Present | production ML, agentic RAG, Bedrock migration |
| AI/ML Engineer | GPS IT Solutions | Jun 2024 - Aug 2025 | RAG platforms, model-risk governance, vector search |
| Software Development Engineer | Amazon Web Services | Aug 2022 - May 2024 | enterprise cloud systems, React/Java/Python, CI/CD |
| Data Engineer | GPS IT Solutions | Jan 2022 - Aug 2022 | data pipelines, AWS Glue, PySpark, analytics workflows |
| Software Engineer | American Express | Feb 2017 - Dec 2020 | Python backend services, REST APIs, enterprise platforms |
Highlights
Southwest Airlines - AI/ML Engineer
- Architected ML fault prediction system for aircraft maintenance - 5 prediction types, 10K+ records, sub-second retrieval
- Led SageMaker → Bedrock migration: 78% cost reduction ($1,740→$371/mo), 600x latency improvement
- Designed 9-stage agentic RAG pipeline (LangGraph, Bedrock Nova Pro/Micro, FAISS + BM25) over 30K+ KB entries
GPS IT Solutions - AI/ML Engineer
- Built GPT-4 + RAG content generation platform with compliance validation, reducing production time by 40%
- Designed AI model risk governance framework with 23 automated evaluation tests achieving regulatory compliance
- Architected FastAPI microservices with FAISS/Pinecone vector search on Kubernetes
Amazon Web Services (AWS) - Software Development Engineer
- Built and shipped features for AWS Application Manager (Systems Manager) serving enterprise customers globally
- Owned full-stack delivery: React/TypeScript frontend + Java/Python backend APIs with operational excellence
- Designed CI/CD and IaC patterns enabling zero-downtime deployments at enterprise scale
GPS IT Solutions - Data Engineer
- Led end-to-end migration of data pipelines from on-prem to AWS (Glue, PySpark)
American Express - Software Engineer
- Developed Python backend services and RESTful APIs for enterprise platforms handling high-volume transactions at scale
If you follow my work here, you’ll mostly see:
- open-source contributions to AI SDKs and agent tooling
- MCP, eval, and developer-experience improvements
- practical full-stack and infrastructure-heavy AI projects
- systems thinking around memory, retrieval, orchestration, and production reliability
University of Central Missouri - M.S. in Big Data Analytics and Information Technology (Jan 2021 - May 2022)
SRM University - B.Tech in Mechanical Engineering (2012 - 2016)
Open to opportunities - Senior AI/ML Engineer • GenAI Platform Engineer • Software Engineer
mukunda-ai.vercel.app • Las Vegas, NV




