Mukunda Rao Katta MukundaKatta

Active across 50+ public developer, AI, open source, and research surfaces
_{Code, datasets, package registries, preprints, reproducible demos, agent workflows, and OSS contribution traces.}

_{Open Source · Publications · Recently Shipped · Packages · Projects · Impact · Experience · Stats}

  8+ years building production systems at Fortune 100 scale
  Former SDE at Amazon Web Services  •  Currently at Southwest Airlines
  Deep expertise in ML systems, distributed architectures, and full-stack engineering

Now: shipped ragdrift (five-dimensional RAG drift detection on crates.io + PyPI), the rust-llm-stack of 5 small Rust crates, and the mcp-stack of 14 MCP servers in the official MCP Registry — 4 RAG/agent helpers + 10 reliable transforms LLMs reach for tools instead of imagining (CSV, regex, JMESPath, diff, SQL formatting, shell escaping, JSON5, TOML/YAML/JSON, IANA timezones, HTML→Markdown). Plus the @mukundakatta/agent* reliability stack (fit → guard → snap → vet → cast), 6 earlier MCP servers (also in the Registry), GitHub Actions on the Marketplace, 53 PyPI packages, and 320+ open PRs across MCP SDKs, FastMCP, claude-code-action, and Anthropic's agent SDK (140 already merged upstream).

Portfolio at a Glance

_{PUBLIC REPOS}
797

_ORIGINALS
322

_{ACTIVE PROJECTS}
284

_FORKS
475

_ARCHIVED
308

Every repo is indexed in claude-workspace - wired for Multica, Claude Code, Codex, OpenClaw, and Cursor to reason across the portfolio.

Profile Maintenance

The profile README is partly managed by scheduled automation. Pull requests run profile checks that compile the Python refresh scripts and verify the managed README markers plus the cached stats files before changes merge.

Latest Drop · The Agent Reliability Stack

🌐 Live at mukundakatta.github.io/agent-stack - single landing page for the whole 119-package ecosystem (npm + PyPI + MCP Registry + GitHub Marketplace).

🤗 Try it live on the HuggingFace Space · jailbreak fixtures on the HF Dataset.

Five small, focused npm packages that fix the boring problems every long-running agent eventually hits. Pure ESM JavaScript, zero runtime deps, TypeScript types in the box. Designed to compose into a pipeline: fit → guard → snap → vet → cast.

Fresh Contributions

Surface	Latest proof
OpenAI GPT Store	Agent Eval Lab - public GPT for lightweight agent evaluation and scenario walkthroughs
Poe	AgentEvalLab - public bot for agent-eval prompts and scoring flows
Poe	OpsScorecardLab - public bot for turning eval scenarios into operations scorecards
Poe	RepoLandscapeLab - public bot for mapping premium agent repo surfaces
Replicate	agent-eval-lab - public model/app page for eval-oriented agent interactions
Replicate	ops-scorecard-lab - public app page for ops scorecard generation
Replicate	repo-landscape-lab - public app page for repo-surface mapping
Hugging Face	Agent Labs Portfolio - curated collection tying together the live Spaces and datasets
Hugging Face	Ops Scorecard Lab - public Space for turning rough workflows into operator-facing scorecards
Modal	agent-eval-lab endpoint - live API endpoint returning structured eval JSON
Modal	ops-scorecard-lab endpoint - live API endpoint for scorecard generation
Modal	repo-landscape-lab endpoint - live API endpoint for repo-landscape mapping
Modal	Agent Labs Portal - public two-panel demo surface for evaluation plans and ops scorecards
OpenRouter	Agent Eval Lab - public OpenRouter app analytics page seeded from the live Hugging Face Space
OpenRouter	Ops Scorecard Lab - public OpenRouter app analytics page for the scorecard Space
Netlify	Agent Eval Lab Static Portal - verified public portal for the agent-eval research and demo surface
Observable	Agent Eval Notebook - public notebook surface for lightweight scorecard exploration
Streamlit	Agent Eval Scorecard - live app for scoring agent behavior with compact operational criteria
Replit	agent-eval-replit-demo - public Replit project with a hosted demo surface
Cloudflare Pages	Agent Evaluation Field Notes - static field-notes surface for scorecards, replay, and RAG guardrails
Firebase Hosting	Agent Evaluation Field Notes - Firebase-hosted mirror of the field-notes surface
Codeberg Pages	MukundaKatta.codeberg.page - public portfolio page routing across the non-GitHub footprint
GitHub	agent-eval-public-notes - public notes for scorecards, replay debugging, and RAG guardrail checks
GitLab	agent-eval-public-notes - GitLab mirror of the reusable agent-eval notes
GitHub	agent-eval-platform-starters - starter artifacts for publishing field notes across cloud, notebook, data, and docs platforms
GitLab	agent-eval-platform-starters - GitLab mirror of the platform-starter artifacts
Gitea	agent-eval-platform-starters - Gitea mirror of the platform-starter artifacts
StackBlitz	agent-eval-platform-starters workspace - browser-editable workspace URL for the starter artifacts
Gitpod	agent-eval-platform-starters workspace - cloud workspace launch URL for the starter artifacts
Read the Docs	Agent Evaluation Field Notes - hosted documentation for the field-notes project
Val Town	agent-scorecard-val - public TypeScript scorecard function for agent evaluation notes
Google Colab	Agent Evaluation Field Notes Scorecard - public notebook for scenario scoring and scorecard walkthroughs
CodeSandbox	Agent Evaluation Field Notes - public sandbox preview for the scorecard app
GitHub Gist	Operational scorecard template - standalone template for tool-using agent reviews
GitHub Gist	Trajectory replay debugging checklist - replay checklist for agent workflow regressions
GitHub Gist	RAG guardrail smoke tests - prompt-injection and vector-poisoning smoke tests
GitLab Snippet	Operational scorecard template - public snippet mirror for scorecard evaluation
GitLab Snippet	Trajectory replay debugging checklist - public snippet mirror for replay debugging
GitLab Snippet	RAG guardrail smoke tests - public snippet mirror for RAG guardrails
Kaggle	Premium Agent Repo Landscape - public dataset mapping premium agent repos by surface, stack, and focus
Kaggle	Agent Eval Scenarios - public eval dataset for lightweight agent benchmarking
Kaggle	building-a-lightweight-agent-eval-benchmark - clean public notebook replacement with a successful run and resilient dataset loading
Codeberg	premium-agent-landscape - public showcase repo for agent portfolio mapping and presentation
Codeberg	agent-eval-lab - public repo for evaluation artifacts and benchmark framing
Codeberg	apache-contribution-atlas - public tracker for Apache-facing contribution work
Codeberg	Documentation PR #784 - clarified HTTPS auth with 2FA and token-based Git usage
GitHub	agent-eval-lab-static - source repo for the Netlify research portal
Apache	fluss PR #3243 - added a blog contribution guide for the Fluss website community docs
Apache	fluss PR #3244 - added an FIP contribution guide for the Fluss contributor workflow
Apache	pulsar-site PR #1139 - fixed failover standby mapping in the 3.0.x docs

Publications

Type	Title	Venue
Landing Page	Lightweight Evaluation and Operational Scorecards for Tool-Using AI Agents	GitHub Pages
Preprint	Lightweight Evaluation and Operational Scorecards for Tool-Using AI Agents	Zenodo
Artifact Repo	lightweight-agent-eval-paper	GitHub
Archive	lightweight-agent-eval-paper	Software Heritage, archived successfully
Preprint Submission	Lightweight Evaluation and Operational Scorecards for Tool-Using AI Agents	SSRN, in process (`PRELIMINARY_UPLOAD`)
Preprint Submission	Lightweight Evaluation and Operational Scorecards for Tool-Using AI Agents	Research Square, declined as not suitable for posting
Preprint Submission	Lightweight Evaluation and Operational Scorecards for Tool-Using AI Agents	MetaArXiv on OSF Preprints, declined as out of scope
Preprint	AI Eval Forge: Mixed-Check Regression Testing for LLM and Agent Workflows	Zenodo
Artifact Repo	ai-eval-forge-paper	GitHub
Preprint Submission	AI Eval Forge: Mixed-Check Regression Testing for LLM and Agent Workflows	SSRN, public abstract page
Preprint Submission	AI Eval Forge: Mixed-Check Regression Testing for LLM and Agent Workflows	MetaArXiv on OSF Preprints, submitted
Preprint Submission	AI Eval Forge: Mixed-Check Regression Testing for LLM and Agent Workflows	Qeios, submitted (article 5247, live in 1 business day)
Preprint	Karna: A Chat-Native, Multi-Channel Architecture for Personal AI Chief-of-Staff Agents	Zenodo
Preprint Submission	Karna: A Chat-Native, Multi-Channel Architecture for Personal AI Chief-of-Staff Agents	Qeios, submitted (article 5249, live in 1 business day)
Artifact Mirror	Karna: A Chat-Native, Multi-Channel Architecture for Personal AI Chief-of-Staff Agents	Figshare public dataset mirror
Artifact Repo	karna-chat-native-assistant-paper	GitHub
Preprint	Six Reliability Primitives for LLM Agents: An Artifact Pattern for Stackable, Single-Concern Libraries	Zenodo
Artifact Repo	agent-stack-reliability-primitives-paper	GitHub
Model Card	agent-stack on HuggingFace Hub - second native DOI `10.57967/hf/8720`	Hugging Face Hub
Article	Six Reliability Primitives for LLM Agents	Medium
Article	Six Reliability Primitives for LLM Agents	DEV Community
Article	Six Reliability Primitives for LLM Agents	Hashnode
Hackathon	AI Agent Olympics @ Milan AI Week 2026 - Agentic Workflows track, registered + approved 2026-05-08	Lablab.ai (build phase May 13-19, $28K+ pool)
Preprint	Agent Trajectory Replay for Debugging Tool-Using AI Workflow Regressions	Zenodo
Preprint Submission	Agent Trajectory Replay for Debugging Tool-Using AI Workflow Regressions	SSRN, submitted for review
Artifact Repo	agent-trajectory-replay-paper	GitHub
Preprint	Small-Rule Guardrails for Retrieval-Augmented Generation: Prompt Injection and Vector Poisoning Checks	Zenodo
Artifact Repo	rag-guardrails-paper	GitHub
Preprint Mirror	Small-Rule Guardrails for Retrieval-Augmented Generation: Prompt Injection and Vector Poisoning Checks	Figshare
Archive	rag-guardrails-paper	Software Heritage, archive accepted
Preprint	Chetana: A Theory-Indexed Probe Framework for AI Consciousness Indicator Scoring	Zenodo
Artifact Repo	chetana-consciousness-indicator-paper	GitHub
Preprint	ML Intern Lab: A Minimal Agentic Workflow for Reproducible Machine Learning Experiment Reports	Zenodo
Preprint Submission	ML Intern Lab: A Minimal Agentic Workflow for Reproducible Machine Learning Experiment Reports	SSRN, submitted for review
Preprint Mirror	ML Intern Lab: A Minimal Agentic Workflow for Reproducible Machine Learning Experiment Reports	Academia.edu
Artifact Repo	ml-intern-lab-paper	GitHub
Artifact Repo	lightweight-eval-scorecards-paper	GitHub
Preprint Submission	Citation Traceability for Web-Native AI Research Workflows	MetaArXiv on OSF Preprints, resubmitted and pending moderator review
Artifact Repo	browser-research-agent-paper	GitHub
Preprint Submission	Context Forge: A Lightweight Method for Diversity-Aware Context Packing and Prompt-Injection-Aware Retrieval	Research Square, QA/QC check
Artifact Repo	context-forge-paper	GitHub
Article	Lightweight Evaluation for Tool-Using AI Agents	Hashnode
Article	I Built 5 Tiny Libraries to Stop My AI Agents from Misbehaving in Production	DEV Community
Article	I Built 5 Tiny Libraries to Stop My AI Agents from Misbehaving in Production	Medium
Research Profile	Mukunda Katta	ORCID
Research Profile	Mukunda Katta	Academia.edu
Archive	agent-stack (landing page)	Software Heritage, archive accepted
Archive	agentvet, agentguard, agentsnap, agentfit, agentcast, agent-quintet-demo	Software Heritage, archive accepted (6 repos)
Live Space	Prompt Injection Shield Demo	Hugging Face Space (Streamlit), live interactive scanner
Eval Dataset	prompt-injection-eval	Hugging Face Dataset, 74 hand-curated rows across 9 categories
Eval Dataset (mirror)	Prompt Injection Eval Set	Kaggle Dataset, MIT, mirror of HF dataset
Research Project	Small-Rule Guardrails for RAG	Open Science Framework, public companion artifact bundling preprint + libs + dataset + demo
Codeberg Mirrors	awesome-prompt-injection-defense · prompt-injection-shield-cli · rag-guardrails-action · homebrew-tap	Codeberg (Forgejo), 4 mirrors
GitLab Mirrors	awesome-prompt-injection-defense · prompt-injection-shield-cli · rag-guardrails-action · homebrew-tap	gitlab.com, 4 mirrors
Replit Workspace	@mukundavjcs6	Replit, prompt-injection-shield-cli imported
Curated List	awesome-prompt-injection-defense	GitHub awesome-list, CC0, links to detection libs + datasets + papers
Curated List Pages	mukundakatta.github.io/awesome-prompt-injection-defense	GitHub Pages render of the awesome list
CLI Package	prompt-injection-shield-cli	PyPI (`pip install prompt-injection-shield-cli`), exposes `pis-scan`
Homebrew Tap	MukundaKatta/homebrew-tap	`brew install MukundaKatta/tap/pis-scan`
Container	ghcr.io/mukundakatta/pis-scan	GitHub Container Registry, built via Actions
Live Demo Mirror	Agent Eval Lab Static	GitHub Pages
GitHub Action	rag-guardrails-action	Composite Action wrapping prompt-injection-shield + vector-poison-score, v0.1.0 released
Live Demo	Agent Eval Lab	Netlify, deployed via netlify-cli
Live Demo Mirror	Agent Eval Lab	Cloudflare Pages
Package	@mukundakatta/agent-scorecard	JSR (Deno + TypeScript registry), v0.1.0
Research Profile	Mukunda Katta	Authorea profile, new submissions paused during platform migration

agentfit

_{Fit it.}

Token-aware message truncation with three strategies (drop-oldest, drop-middle, priority). Pluggable tokenizers. Per-model estimators.

agentguard

_{Sandbox it.}

Network-egress firewall: a declarative allowlist of domains agent tools can fetch. Throws on violation, with a clear error.

agentsnap

_{Test it.}

Snapshot tests for tool-call traces. Catch silent regressions in LLM tool use the way you catch UI regressions today.

agentvet

_{Vet it.}

Validate tool args before execution. Wrap any tool function; on bad args, throw a typed error with an LLM-friendly retry hint.

agentcast

_{Validate it.}

Structured-output enforcer. Validate the model's response, retry with the validation error as feedback, return typed data or throw after N attempts. BYO LLM and validator.

npm i @mukundakatta/agentfit @mukundakatta/agentguard @mukundakatta/agentsnap @mukundakatta/agentvet @mukundakatta/agentcast

Each one also ships as an MCP server so Claude Desktop, Cursor, Cline, Windsurf, and Zed can call them directly mid-conversation:

npx -y @mukundakatta/agentfit-mcp     # fit a chat history into a budget
npx -y @mukundakatta/agentguard-mcp   # check URLs against an egress policy
npx -y @mukundakatta/agentsnap-mcp    # diff tool-call traces
npx -y @mukundakatta/agentvet-mcp     # validate tool args + generate retry hints
npx -y @mukundakatta/agentcast-mcp    # extract / validate JSON from LLM text

_{Sibling libraries that share a design philosophy: small, focused, zero-dep, BYO-LLM. Each one solves a single concrete reliability problem so you can pick the ones you need without dragging in a framework. Previous drop, streamparse (streaming JSON parser, npm + Homebrew + MCP Registry), is still in active use.}

Open Source Focus

I contribute practical fixes to AI SDKs, MCP tooling, eval frameworks, agent infrastructure, structured outputs, and developer experience.

My lane is finding the sharp edges that slow builders down: unclear contracts, brittle tool calls, docs that almost answer the question, eval gaps where regressions hide, and AI tooling that needs better failure signals. I like small, reviewable patches with clear intent, and compact packages that turn repeated manual checks into reusable workflows.

Recent contribution areas (merged upstream):

Microsoft - security and architecture docs for internal AI-engineering toolchains (hve-core, physical-ai-toolchain)
Pydantic - pydantic-ai integration with the Vercel AI SDK
Hugging Face ecosystem - safetensors Python bindings, sentence-transformers trainer migration docs
Meilisearch - heed multi-target docs.rs infrastructure
Vercel - next.js documentation
Apache Software Foundation - doc / comment fixes across iceberg, pulsar, skywalking, ozone, iotdb

I keep a public log of selected OSS work in oss-contributions.

External PR Footprint

_{Across ~425 unique external repos (excluding my own). Source: GitHub search API, author:MukundaKatta is:pr -user:MukundaKatta. Last refreshed 2026-05-14.}

_MERGED
140
_{across 106 external repos}

_OPEN
320
_{awaiting review / response}

_CLOSED
325
_{not merged}

_{TOTAL EXTERNAL}
785
_{PRs authored upstream}

Top external repos by merged PRs

Rank	Repository	Merged
1	microsoft/agent-governance-toolkit	19
2	microsoft/physical-ai-toolchain	5
3	microsoft/hve-core	4
3	PrefectHQ/fastmcp	4
5	Cacti/plugin_mactrack	3
6	meilisearch/heed	2
6	apache/pulsar-site	2
6	momenbasel/PureMac	2
6	DiogoRibeiro7/pyseas	2

Top external repos by total PR volume (merged + open + closed)

Rank	Repository	PRs
1	microsoft/agent-governance-toolkit	21
2	langchain-ai/langgraph	17
3	googleapis/python-genai	15
4	openai/openai-node	12
5	anthropics/claude-agent-sdk-python	11
6	ChromeDevTools/chrome-devtools-mcp	10
6	modelcontextprotocol/python-sdk	10
6	PrefectHQ/fastmcp	10
9	google/magika	9
9	anthropics/anthropic-sdk-python	9

Distribution pattern. Each flagship ships as a complete unit, not a single npm package:

library  →  Python port  →  CLI binary  →  GitHub Action  →  Homebrew formula  →  MCP server
   npm           PyPI                           Marketplace        brew tap            npm

So the same problem (mcpcheck, skillint, streamparse) is solvable from any environment a developer or AI assistant happens to be in: a TypeScript app, a Python script, a CI workflow, a terminal, or directly inside Claude / Cursor / Cline / Windsurf / Zed.

Recent OSS Highlights

openai/openai-node #1831 — improved fallback handling for non-standard JSON error bodies
openai/tiktoken #529 — added PyInstaller hooks for dynamic encoding plugins
googleapis/python-genai #2298 — clarified response_schema vs response_json_schema
microsoft/playwright-mcp #1562 — clarified extension connection and tab-selection flow
anthropics/anthropic-sdk-python #1412 — fixed async memory tool example docs
stanford-crfm/helm #4210 — fixed later-page deep links for run instances

Recently Shipped

Last refreshed 2026-05-13 from npm, PyPI, and the GitHub API.

Latest releases

2026-05-11 · @mukundakatta/lorem-mcp v0.1.0 · npm
2026-05-11 · @mukundakatta/color-mcp v0.1.0 · npm
2026-05-11 · @mukundakatta/mime-mcp v0.1.0 · npm

Recently merged PRs

2026-05-12 · freeCodeCamp/freeCodeCamp #67330 — fix(curriculum): clarify DOM element node wording
2026-05-04 · PrefectHQ/fastmcp #4069 — Fix #4056: keep blank query values, add token bucket regression test
2026-05-04 · PrefectHQ/fastmcp #4076 — fix(openapi): keep blank values in parse_qs (refs #4056)
2026-05-04 · PrefectHQ/fastmcp #4070 — docs(integrations): add Pydantic AI FastMCP toolset guide
2026-05-07 · elastic/beats #50281 — docs: fix 'ElasticSearch' casing and 'a SSL' -> 'an SSL' across reference docs

Published Packages

npm (scope @mukundakatta):

Flagship packages:

Package	Why it matters	Install
@mukundakatta/streamparse _{partial JSON for LLM streams}	Streaming JSON parser that yields partial valid trees as tokens arrive. Render LLM tool calls mid-stream, recover dropped responses, parse messy ` ```json ` blocks. Zero deps, 64 tests. Also published as an MCP server in the official MCP Registry.	`npm i @mukundakatta/streamparse`
@mukundakatta/streamparse-mcp _{MCP: parse partial JSON}	MCP server that lets Claude / Cursor / Cline / Windsurf / Zed parse partial, truncated, or messy JSON on demand. Three tools: `parse_partial_json`, `extract_json_from_text`, `validate_json`.	`npx -y @mukundakatta/streamparse-mcp`
@mukundakatta/mcpcheck _{MCP config quality gate}	Lint MCP config files for Claude Desktop, Cursor, Cline, Windsurf, and Zed. CLI, GitHub Action, and SARIF for code scanning.	`npm i -g @mukundakatta/mcpcheck`
@mukundakatta/designlint _{frontend quality checks}	HTML/CSS accessibility and design linter for contrast, touch targets, headings, form labels, and leaked secrets.	`npm i -g @mukundakatta/designlint`
@mukundakatta/skillint _{AI skill validation}	Lint Claude Code `SKILL.md` files for frontmatter, required fields, descriptions, and hardcoded secrets.	`npm i -g @mukundakatta/skillint`
@mukundakatta/ai-eval-forge _{eval harness}	Zero-dependency eval harness for comparing model, prompt, and agent behavior. CLI plus programmatic API; also on PyPI.	`npm i @mukundakatta/ai-eval-forge`
@mukundakatta/codex-skill-kit _{Codex skill tooling}	Scaffold and validate Codex skills from the command line. Published for npm and PyPI workflows.	`npm i -g @mukundakatta/codex-skill-kit`
@mukundakatta/kavach _{AI-app threat signals}	Small, inspectable threat-scoring library for AI-app security monitoring: signals to weighted score to tier and playbook.	`npm i @mukundakatta/kavach`

More npm packages (90) - grouped by area

MCP servers (20) - callable directly from Claude Desktop, Cursor, Cline, Windsurf, Zed via stdio. The 14 in mcp-stack plus the 5 in @mukundakatta/agent* and streamparse-mcp. All 20 listed in the official MCP Registry:

Package	What it does
`@mukundakatta/streamparse-mcp`	Parse partial / truncated / messy JSON for LLM tool calls. Listed in the official MCP Registry.
`@mukundakatta/agentfit-mcp`	Token-aware message truncation: count tokens, fit a chat history into a budget.
`@mukundakatta/agentguard-mcp`	Check URLs against a network-egress allowlist before any tool fetch.
`@mukundakatta/agentsnap-mcp`	Diff and validate tool-call trace snapshots.
`@mukundakatta/agentvet-mcp`	Validate tool-call args against a shape spec; produce LLM-friendly retry hints.
`@mukundakatta/agentcast-mcp`	Extract JSON from messy LLM text and validate it against a shape.
`@mukundakatta/promptbudget-mcp`	Truncate text to a token budget with 4 strategies (head, tail, head+tail, smart-cut). Sibling to the `promptbudget` Rust crate. Listed in the official MCP Registry.
`@mukundakatta/citecite-mcp`	Inject `[1] [2]` citation markers into RAG outputs, parse them back, or strip them. Sibling to the `citecite` Rust crate. Listed in the official MCP Registry.
`@mukundakatta/ragmetric-mcp`	Compute RAG retrieval IR metrics on demand: recall@k, hit@k, MRR, NDCG@k, batch eval. Sibling to the `ragmetric` Rust crate. Listed in the official MCP Registry.
`@mukundakatta/ragdrift-mcp`	Diagnose RAG drift alerts: interpret scores, recommend thresholds, explain the 5 drift dimensions. Sibling to `ragdrift` / `ragdrift-py`. Listed in the official MCP Registry.
`@mukundakatta/csv-tools-mcp`	RFC 4180 CSV parsing + generation. Handles quoted fields, embedded commas, BOMs, CRLF. Tools: `parse_csv`, `to_csv`, `pluck_columns`. Listed in the official MCP Registry.
`@mukundakatta/regex-test-mcp`	Trustworthy JS regex testing with real match offsets, named groups, safe against zero-width loops. Tools: `test_regex`, `find_all`, `replace`. Listed in the official MCP Registry.
`@mukundakatta/jmespath-mcp`	Run JMESPath queries against deep JSON. Pure JS, no jq binary needed. Tool: `json_query`. Listed in the official MCP Registry.
`@mukundakatta/diff-mcp`	Character-precise unified diffs + patch application + parsing. For code-review and code-edit agents. Tools: `unified_diff`, `apply_patch`, `parse_patch`. Listed in the official MCP Registry.
`@mukundakatta/sqlfmt-mcp`	Deterministic SQL formatting across 19 dialects (postgres, mysql, snowflake, bigquery, etc.). Tools: `format_sql`, `list_dialects`. Listed in the official MCP Registry.
`@mukundakatta/shellquote-mcp`	Safe shell argument escaping for bash, cmd.exe, PowerShell. Stops LLM-generated shell commands from breaking on quotes, $vars, backslashes. Tools: `quote_bash`, `quote_bash_argv`, `quote_cmd`, `quote_powershell`. Listed in the official MCP Registry.
`@mukundakatta/json5-mcp`	Parse JSON-with-comments / trailing-commas / unquoted keys, and round-trip to strict JSON. Tools: `parse_json5`, `to_json5`, `to_strict_json`. Listed in the official MCP Registry.
`@mukundakatta/toml-yaml-json-mcp`	Parse, format, and convert configs across TOML / YAML / JSON. LLMs especially mishandle TOML. Tools: `parse`, `format`, `convert`. Listed in the official MCP Registry.
`@mukundakatta/timezone-mcp`	IANA timezone math with real DST rules via `Intl.DateTimeFormat`. LLMs hallucinate offsets. Tools: `convert_tz`, `now_in`, `tz_offset`. Listed in the official MCP Registry.
`@mukundakatta/html-to-markdown-mcp`	HTML → Markdown via Turndown. For web-scraping and read-the-page agents. Tools: `html_to_md`, `extract_text`. Listed in the official MCP Registry.

Structured outputs & parsing (1)

Package	What it does
`@mukundakatta/streamparse`	Streaming JSON parser that yields partial valid trees as tokens arrive.

Agent infrastructure (11)

Package	What it does
`@mukundakatta/agentfit`	Token-aware message truncation; fit chat history into a context budget.
`@mukundakatta/agentguard`	Network-egress firewall for agent tools: declarative domain allowlist.
`@mukundakatta/agentsnap`	Snapshot tests for tool-call traces, like Jest snapshots for LLM tool use.
`@mukundakatta/agentvet`	Validate tool args before execution, with LLM-friendly retry hints.
`@mukundakatta/agentcast`	Structured-output enforcer: validate, retry with feedback, BYO-LLM/validator.
`@mukundakatta/agent-loop-breaker`	Detect repeated agent steps and stop runaway loops.
`@mukundakatta/agent-regression-lens`	Detect regressions between baseline and current AI agent runs.
`@mukundakatta/agent-trajectory-replay`	Replay and diff AI agent event trajectories for debugging regressions.
`@mukundakatta/tool-call-contracts`	Validate LLM tool-call payloads with small JSON-like contracts.
`@mukundakatta/tool-permission-gate`	Policy-check agent tool calls before execution.
`@mukundakatta/tool-result-taint`	Track untrusted tool output before it enters prompts or actions.

RAG & retrieval (6)

Package	What it does
`@mukundakatta/rag-quality-kit`	Heuristic quality metrics for RAG retrieval and grounded answers.
`@mukundakatta/rag-staleness-auditor`	Find stale RAG chunks by age, version, and freshness requirements.
`@mukundakatta/retrieval-acl-filter`	Enforce document ACLs after retrieval and before prompting.
`@mukundakatta/vector-poison-score`	Score retrieved documents for vector/RAG poisoning signals.
`@mukundakatta/embedding-dedupe`	Deduplicate near-identical embedding records by cosine similarity.
`@mukundakatta/context-drift-detector`	Detect topic drift between user intent, retrieved context, and AI answers.

Prompt & output safety (5)

Package	What it does
`@mukundakatta/pii-sentry`	Detect and redact PII and secret-like values before AI processing.
`@mukundakatta/prompt-injection-shield`	Prompt-injection risk scanner for untrusted AI context.
`@mukundakatta/llm-output-sanitizer`	Sanitize LLM outputs before rendering, SQL, shell, or markdown sinks.
`@mukundakatta/system-prompt-leak-scan`	Detect system prompt leakage in model outputs.
`@mukundakatta/jailbreak-corpus-mini`	Small local jailbreak + prompt-injection fixture set for tests.

Context & prompt engineering (4)

Package	What it does
`@mukundakatta/context-forge`	Context engineering toolkit for ranking, packing, and risk-scanning RAG context.
`@mukundakatta/context-window-packer`	Pack context chunks into a budget by relevance and priority.
`@mukundakatta/prompt-token-trim`	Trim prompt messages to fit a token budget while preserving priority.
`@mukundakatta/prompt-version-diff`	Diff prompt templates and flag risky instruction changes.

Evals & tracing (3)

Package	What it does
`@mukundakatta/eval-dataset-smith`	Generate balanced eval cases from bugs, docs, examples, and policies.
`@mukundakatta/eval-flake-detector`	Detect flaky LLM eval cases across repeated runs.
`@mukundakatta/llm-trace-sampler`	Sample LLM traces by risk, errors, latency, and deterministic ids.

Cost, routing & caching (4)

Package	What it does
`@mukundakatta/llm-cost-guard`	Estimate AI request cost and enforce per-request or session budgets.
`@mukundakatta/model-fallback-planner`	Plan model fallback chains from capability, cost, and health data.
`@mukundakatta/model-router-policy`	Policy-based model routing by capability, cost, latency, and privacy.
`@mukundakatta/semantic-cache-key`	Stable semantic cache keys for AI prompts, tools, models, and retrieval context.

Supply chain, citations, consent (5)

Package	What it does
`@mukundakatta/ai-supply-chain-manifest`	Build and validate lightweight AI model / data / tool manifests.
`@mukundakatta/citation-integrity-check`	Verify answer citations refer to supplied source ids.
`@mukundakatta/consent-redaction-log`	Record consent-aware redactions for privacy review trails.
`@mukundakatta/hallucination-risk-meter`	Estimate hallucination risk from answer, context, citations, and uncertainty language.
`@mukundakatta/llm-response-schema-lite`	Tiny schema validator for structured LLM responses.

Install any of them with npm i @mukundakatta/<package>.

PyPI:

Package	Purpose	Install
claude-skill-check	Lint Claude Code `SKILL.md` files for YAML frontmatter, required fields, description quality, and secret patterns.	`pip install claude-skill-check`
mcp-config-check	Validate MCP configs across Claude Desktop, Cursor, Cline, Windsurf, and Zed; catches auth, transport, duplicate, and placeholder issues.	`pip install mcp-config-check`
claude-hooks-check	Audit Claude Code hooks for malformed matchers, dangerous commands, invalid events, and hardcoded secrets.	`pip install claude-hooks-check`
claude-commands-check	Validate Claude Code slash-command files for naming, frontmatter, model values, allowed-tools shape, and secret leakage.	`pip install claude-commands-check`
llm-usage-report	Parse raw LLM API response logs and generate token and cost reports by provider, model, day, project, or user.	`pip install llm-usage-report`
codex-skill-kit	Scaffold and validate Codex skills from Python environments; mirrors the npm CLI workflow.	`pip install codex-skill-kit`
ai-eval-forge	Zero-dependency LLM and agent eval harness with exact, regex, token-F1, JSON, and citation-coverage checks.	`pip install ai-eval-forge`
agent-run-diff	Compare baseline and current agent runs across success, errors, tools, output drift, steps, latency, and cost.	`pip install agent-run-diff`

More PyPI packages (44) - Python ports of the @mukundakatta JS libraries

Streaming + agent reliability stack (6)

Package	What it does
`partial-json-stream`	Streaming JSON parser that yields partial valid trees as tokens arrive.
`agentfit-py`	Token-aware message truncation; fit a chat history into a context budget.
`agentguard-firewall`	Network-egress firewall for agent tools.
`agentsnap-py`	Snapshot tests for tool-call traces.
`agentvet-py`	Validate tool args before execution; LLM-friendly retry hints.
`agentcast-py`	Structured-output enforcer; validate, retry with feedback.

Prompt + output safety (3)

Package	What it does
`pii-sentry-py`	Detect and redact PII and secret-like values before AI processing.
`prompt-injection-shield-py`	Prompt-injection risk scanner for untrusted AI context.
`llm-output-sanitizer-py`	Sanitize LLM outputs before HTML / SQL / shell / markdown sinks.

RAG + retrieval (3)

Package	What it does
`rag-quality-kit`	Heuristic quality metrics for RAG retrieval and grounded answers.
`vector-poison-score`	Score retrieved documents for vector / RAG poisoning signals.
`embedding-dedupe`	Deduplicate near-identical embedding records by cosine similarity.

Cost, caching, evals (3)

Package	What it does
`llm-cost-guard-py`	Estimate AI request cost and enforce per-request or session budgets.
`semantic-cache-key`	Stable semantic cache keys for AI prompts, tools, models, retrieval.
`eval-flake-detector`	Detect flaky LLM eval cases across repeated runs.

Verification + grounding (3)

Package	What it does
`citation-integrity-check`	Verify answer citations refer to supplied source ids.
`hallucination-risk-meter`	Estimate hallucination risk from answer + context + citations.
`system-prompt-leak-scan`	Detect system-prompt leakage in model outputs.

Agent infrastructure + meta (6)

Package	What it does
`mk-agentkit`	Meta-package re-exporting all 5 agent-stack ports under one import.
`agent-loop-breaker-py`	Detect repeated agent steps and stop runaway loops.
`agent-regression-lens-py`	Detect regressions between baseline and current agent runs.
`agent-trajectory-replay-py`	Replay and diff agent event trajectories.
`tool-call-contracts-py`	Validate LLM tool-call payloads with small JSON-like contracts.
`tool-permission-gate-py`	Policy-check agent tool calls before execution.

Tools / safety / privacy (4)

Package	What it does
`tool-result-taint-py`	Track untrusted tool output before it enters prompts.
`jailbreak-corpus-mini-py`	Local jailbreak + prompt-injection fixture set for tests.
`consent-redaction-log-py`	Record consent-aware redactions for privacy review trails.
`kavach-py`	Threat-scoring library for AI-app security monitoring.

RAG (3)

Package	What it does
`rag-staleness-auditor-py`	Find stale RAG chunks by age, version, and freshness requirements.
`retrieval-acl-filter-py`	Enforce document ACLs after retrieval and before prompting.
`context-drift-detector-py`	Detect topic drift between intent, context, and answer.

Context engineering (5)

Package	What it does
`context-forge-py`	Context engineering toolkit: ranking, packing, risk-scanning.
`context-window-packer-py`	Pack context chunks into a budget by relevance and priority.
`prompt-token-trim-py`	Trim prompt messages to fit a token budget while preserving priority.
`prompt-version-diff-py`	Diff prompt templates and flag risky instruction changes.
`llm-response-schema-lite-py`	Tiny schema validator for structured LLM responses.

Evals + cost + routing (5)

Package	What it does
`eval-dataset-smith-py`	Generate balanced eval cases from bugs, docs, examples, policies.
`llm-trace-sampler-py`	Sample LLM traces by risk, errors, latency, and deterministic ids.
`llm-cost-guard-py`	Estimate AI request cost and enforce per-request or session budgets.
`model-fallback-planner-py`	Plan model fallback chains from capability, cost, and health data.
`model-router-policy-py`	Policy-based model routing by capability, cost, latency, privacy.

Niche linters (4)

Package	What it does
`mcpcheck-py`	Lint MCP config files for Claude Desktop, Cursor, Cline, Windsurf, Zed.
`skillint-py`	Lint Claude Code SKILL.md files.
`designlint-py`	HTML/CSS accessibility and design linter.
`ai-supply-chain-manifest-py`	Build and validate lightweight AI model / data / tool manifests.

GitHub Marketplace (7 Actions):

Composite GitHub Actions, discoverable on the GitHub Marketplace:

Linters:

Agent-stack CI gates:

agentvet-action - fail PRs on bad LLM tool definitions
agentsnap-action - fail PRs on tool-call trace drift
mcp-stack-validate-action - one CI gate that runs all 5 agent-stack tools

Homebrew tap - mukundakatta/tools:

brew tap mukundakatta/tools
brew install claude-skill-check mcp-config-check claude-hooks-check claude-commands-check

Each ships a CLI, a programmatic API, and (for the linters) a composite GitHub Action you can drop into any workflow in 3 lines.

🦀 crates.io (Rust) - MukundaKatta - small focused crates for the LLM / RAG / agent niche:

Crate	Purpose	Install
ragdrift _{RAG drift detection}	Five-dimensional drift detection for production RAG: data, embedding, response, confidence, query mix. MMD + sliced Wasserstein + KS + PSI in pure Rust. Sibling Python wheel ships as `ragdrift-py`.	`cargo add ragdrift`
ragdrift-core _{RAG drift core}	Pure-Rust core of `ragdrift`: KS, PSI, MMD (RBF kernel), 1D and sliced Wasserstein, k-means. No BLAS dependency. Used as the backbone of the Python `ragdrift-py` wheel via PyO3.	`cargo add ragdrift-core`
embedrank _{vector top-k retrieval}	Batched cosine / dot / L2 distance for f32 embeddings + a heap-based top-k selector. No BLAS, no allocator surprises. Designed for the hot path of small-to-medium RAG retrieval.	`cargo add embedrank`
promptbudget _{token-budget truncation}	Token-budget-aware text truncation with multiple strategies (head, tail, head+tail, smart-cut with marker). Bring-your-own tokenizer; no hard `tiktoken` dependency.	`cargo add promptbudget`
stopstream _{streaming stop-sequence detector}	Streaming-safe stop-sequence detector for LLM token streams. Holds back exactly the suffix-prefix overlap so partial matches at chunk boundaries never leak downstream. UTF-8 boundary safe.	`cargo add stopstream`
citecite _{RAG citation markers}	Citation-marker `[1] [2]` injector + parser for RAG outputs. Round-trippable: inject markers tied to source ids, parse them back when post-processing, or strip them entirely.	`cargo add citecite`
ragmetric _{RAG retrieval IR metrics}	IR metrics for RAG retrieval evaluation: Recall@k, Hit@k, MRR, NDCG@k. Pure data ops, no model dependencies. Sibling-in-spirit to `ragdrift`.	`cargo add ragmetric`

The five rust-llm-stack crates are also available together as one workspace: MukundaKatta/rust-llm-stack. Each crate is independently versioned and published.

More crates (78) - grouped by area

Agent reliability stack (10) - sibling Rust ports of the @mukundakatta/agent* npm family:

Crate	Purpose
`agentfit`	Token-aware message truncation; pluggable tokenizers (`tiktoken` feature for accurate BPE).
`agentguard`	Network-egress allowlist for AI agent tools; optional `reqwest-middleware` integration.
`agentsnap`	Snapshot tests for agent traces; Jest-style record-and-diff.
`agentvet`	Validate LLM-generated tool args against a JSON Schema; LLM-friendly retry hints.
`agentcast`	Structured-output enforcer: repair → validate → optional retry-with-LLM.
`agenttrace`	Run-level cost + latency aggregation; p50/p95 + per-model breakdown.
`agentprompt`	Jinja2-syntax LLM prompt templates with role-aware `Messages` builder.
`agentidemp`	Idempotency keys for agent retries; deterministic content-derived (sha256-hex / UUIDv5).
`agenttap`	Wire-level prompt introspection; credentials redacted by default.
`llmfleet`	Fleet-level batch dispatcher; pool requests across tasks for 50% off via Batch APIs.

Cross-provider primitives (2) - small focused primitives that don't depend on any official SDK:

Crate	Purpose
`claude-stream`	Incremental SSE event-stream parser → typed `Event` enum.
`llm-json-repair`	Three-pass JSON repair (fences, balanced extraction, trailing commas) for messy LLM output.

Cost & budget (6) - per-provider cost calculators plus the aggregator and the concurrency cap:

Crate	Purpose
`claude-cost`	Cache-aware cost calculator for Anthropic API + Bedrock model IDs.
`openai-cost`	Cache-aware OpenAI cost from a usage block; supports `prompt_tokens_details.cached_tokens`.
`gemini-cost`	Cache-aware Google Gemini cost from a usage block; Gemini 2.5 family.
`bedrock-cost`	Cross-vendor Bedrock pricing (Anthropic, Llama, Mistral, Cohere, Titan, AI21); inference-profile aware across regions.
`cost-meter`	Provider-agnostic aggregator: total LLM cost across providers, models, and time windows.
`token-budget-pool`	Shared token + dollar cap across concurrent LLM tasks; thread-safe; `BudgetExceeded` on push past cap.

Observability & tracing (2)

Crate	Purpose
`cachebench`	Prompt-cache observability: per-call hit ratio, cost saved, regression alerts, miss-aware retry.
`otel-genai-bridge`	Translate LLM telemetry attributes between OpenInference and OTel GenAI semantic conventions.

Agent runtime primitives (11) - small pieces every long-running agent loop needs:

Crate	Purpose
`agent-event-emit`	Structured event emitter for agent runs; append-only, JSON-line serializable.
`step-id`	Stable, deterministic IDs for agent steps (hash of run_id + step_index + kind).
`trace-diff`	Diff two agent traces semantically; align by event type + key, ignore timestamps.
`trace-redact`	Redact API keys, tokens, emails, phone numbers from agent traces before persist.
`tool-arg-coerce`	Fix common type slips in LLM-generated tool arguments (string→int/float/bool).
`tool-loop-break`	Detect repeated tool invocations and break runaway loops.
`tool-output-truncate`	Truncate tool output (file reads, command runs) before adding to context.
`tool-retry-policy`	Declarative retry policy for LLM tool calls; per-tool max-attempts and backoff.
`llm-circuit-breaker`	Tiny circuit breaker for LLM API calls; Closed/Open/HalfOpen.
`llm-retry`	Runtime-agnostic full-jitter exponential backoff with built-in retryable-code lists per provider.
`llm-message-hash`	Stable canonical hash of LLM request/message structures; recursive key-sort.

LLM streaming + output cleanup (14) - the boring data-massaging that ships agents:

Crate	Purpose
`chunk-flush`	Flush-on-newline buffer for streaming LLM output.
`lineify`	Turn a token-by-token stream into stable line events.
`sse-frame`	Streaming parser for Server-Sent Events frames used by LLM APIs (OpenAI, Anthropic, Gemini).
`stream-chunkrec`	Recombine LLM streaming token deltas into stable text; buffers partial words.
`json-streamparse-rs`	Streaming JSON balance detector; feed bytes incrementally.
`prompt-fence-strip`	Strip ```code fences```, leading prose, and trailing chatter from LLM output.
`markdown-strip`	Strip Markdown formatting (headers, bold, italic, links, code, blockquotes).
`html-entity-fix`	Decode HTML entities (`& < '` etc) that LLMs sometimes emit by accident.
`bom-strip`	Strip UTF-8/16/32 BOM bytes and stray U+FEFF code points.
`emoji-sanitize`	Normalize or strip emoji-related Unicode (presentation selectors, variation selectors).
`toml-repair`	Repair messy TOML emitted by LLMs (fences, line endings).
`yaml-repair`	Repair messy YAML emitted by LLMs (fences, tabs→spaces, dedent).
`schema-coerce`	Coerce LLM JSON values to a simple field-schema (string→int, bool, float).
`json-pluck`	Pluck a single value out of a `serde_json::Value` by dotted path or JSON pointer.

Cross-provider primitives (2) - small focused primitives that don't depend on any official SDK:

Crate	Purpose
`claude-stream`	Incremental SSE event-stream parser → typed `Event` enum.
`llm-json-repair`	Three-pass JSON repair (fences, balanced extraction, trailing commas) for messy LLM output.

Cost & budget (6) - per-provider cost calculators plus the aggregator and the concurrency cap:

Crate	Purpose
`claude-cost`	Cache-aware cost calculator for Anthropic API + Bedrock model IDs.
`openai-cost`	Cache-aware OpenAI cost from a usage block; supports `prompt_tokens_details.cached_tokens`.
`gemini-cost`	Cache-aware Google Gemini cost from a usage block; Gemini 2.5 family.
`bedrock-cost`	Cross-vendor Bedrock pricing (Anthropic, Llama, Mistral, Cohere, Titan, AI21); inference-profile aware across regions.
`cost-meter`	Provider-agnostic aggregator: total LLM cost across providers, models, and time windows.
`token-budget-pool`	Shared token + dollar cap across concurrent LLM tasks; thread-safe; `BudgetExceeded` on push past cap.

Observability & tracing (2)

Crate	Purpose
`cachebench`	Prompt-cache observability: per-call hit ratio, cost saved, regression alerts, miss-aware retry.
`otel-genai-bridge`	Translate LLM telemetry attributes between OpenInference and OTel GenAI semantic conventions.

Eval & introspection (3)

Crate	Purpose
`eval-flake-rs`	Detect flaky LLM eval cases by tracking pass/fail across repeated runs.
`gold-cmp`	Pairwise comparison runner for gold-set LLM evals: A vs B winner counting.
`latency-buckets`	Streaming histogram + percentile estimator for LLM call latencies.

Prompt + input safety (7) - input-side hardening before content reaches the model:

Crate	Purpose
`prompt-inj-rs`	Prompt-injection risk scanner; Rust port of `@mukundakatta/prompt-injection-shield`.
`regex-pii-rs`	Regex-only PII detector for emails, phones, SSNs, credit cards.
`secret-mask`	Mask known secret patterns (API keys, JWTs, AWS access keys, GitHub tokens).
`output-sanitize-rs`	Strip dangerous HTML/SQL/shell snippets from LLM output before render or query.
`homoglyph-detect`	Detect Cyrillic/Greek lookalike chars masquerading as ASCII (prompt-injection defense).
`zero-width-strip`	Strip zero-width and bidi-control Unicode characters from text.
`rtl-flip-detect`	Detect right-to-left override (U+202E) and other bidi-control characters.

RAG infrastructure (7) - retrieval-side primitives below the flagship ragdrift / embedrank line:

Crate	Purpose
`bm25-rerank`	BM25 reranker for RAG; in-memory term-frequency reranking against a small candidate set.
`mmr-rerank`	Maximal Marginal Relevance reranker; diversify a set of retrieved docs.
`rerank-blend`	Blend N reranker score streams (dense, BM25, cross-encoder) with configurable weights.
`code-chunk`	Split source code into RAG-friendly chunks that respect function and class boundaries.
`markdown-chunk`	Split Markdown into RAG-friendly chunks that respect heading hierarchy.
`cosine-fast`	Hot-loop cosine similarity for f32 slices; auto-vectorized scalar core.
`embed-key`	Deterministic cache key for an embedding request; hash text + mix in provider/model.

Caching + hashing primitives (3)

Crate	Purpose
`content-cas`	Content-addressed cache primitive: store bytes under their SHA-256 hex.
`prompt-hash`	Deterministic cache key for an LLM prompt; normalize whitespace, hash messages.
`promptver`	Hash and version prompt templates so eval results, cache keys, and audit logs stay aligned.

Pure-Rust utility cores (13) - small, allocation-disciplined building blocks:

Crate	Purpose
`snipsplit-core`	Token-aware text chunker for RAG ingestion.
`lshdedup-core`	MinHash + LSH near-duplicate detection.
`vecnorm-core`	Bulk vector ops on f32 matrices.
`toklab-core`	Bulk tokenizer + counter for OpenAI BPE encodings.
`annflat-core`	Small in-memory flat-file ANN over f32 vectors.
`maskprompt-core`	PII redaction for LLM prompts.
`embedcache-core`	Content-addressed embedding cache.
`textsanity-core`	Unicode / whitespace / encoding cleanup.
`secretsniff-core`	Source-code secret scanner.
`llm-error-class`	Classify LLM provider error responses (rate-limit, auth, server, context-window, content-policy).
`char-token-est`	Tokenless byte/char-based token-count estimator; per-model-family calibrated.
`tiktoken-stream`	Streaming token counter for partial LLM responses; accumulates across chunks without re-tokenizing.
`lru-tokens`	LRU cache weighted by token count, not entry count; bound a prompt cache by token budget.

Total: 85 published crates under MukundaKatta on crates.io.

🤗 HuggingFace - mukunda1729 - 14 Spaces · 13 Datasets:

🚀 Live Gradio playgrounds (6):

Space	What you can try
`agent-stack-demo`	All 5 libs (`fit`, `guard`, `snap`, `vet`, `cast`) in one app.
`token-counter`	Count tokens for any text across Claude / GPT / Llama tokenizers.
`json-extractor`	Pull clean JSON out of messy LLM output (fenced, inline, unfenced).
`pii-redactor`	Find emails, phones, secrets, and IDs - mask, hash, or highlight.
`prompt-injection-detector`	Heuristic scanner for the most common injection families.
`mcp-config-validator`	Sanity-check Claude Desktop / Cursor / Cline / Windsurf / Zed configs.

📖 Static reference & explainer pages (8):

Space	What it covers
`agent-stack-tour`	Guided tour of all 5 libraries with install commands and live links.
`why-this-stack`	The thinking behind the stack - what's broken, why these 5 libs.
`install-cheatsheet`	All install commands across pip, npm, and MCP.
`mcp-quickstart`	Add the 5 MCP servers to Claude Desktop / Cursor / Cline / Windsurf / Zed.
`fit-strategies-explained`	Visual explainer: drop-oldest vs drop-middle vs priority.
`trace-format-reference`	Field-by-field reference for the agentsnap trace JSON schema.
`prompt-injection-taxonomy`	10-category taxonomy with examples + the cheap defense for each.
`dataset-cards-index`	One-page index of the 16 datasets below.

📊 Datasets (16) - all MIT, all datasets.load_dataset("mukunda1729/<name>") ready:

Dataset	Rows	Purpose
`jailbreak-corpus-mini`	15	Curated jailbreak fixtures across 8 categories.
`prompt-injection-patterns-extended`	30	Prompt-injection patterns across 10 categories.
`pii-detection-fixtures`	25	PII / secret strings labeled with span offsets.
`tool-arg-validation-cases`	20	(Tool, schema, args) tuples - valid + invalid.
`mcp-tool-test-fixtures`	22	MCP tool-call args across 8 categories.
`llm-output-extraction-cases`	20	Messy LLM outputs with expected JSON.
`hallucination-risk-cases`	20	Prompt → response pairs rated for hallucination risk.
`rag-quality-benchmarks-mini`	15	RAG eval queries with ground-truth answers.
`agent-trace-samples`	10	agentsnap-format tool-call traces (good + regressed pairs).
`agent-budget-violations`	15	Agent runs with budget caps + actual usage + root cause.
`token-counting-edge-cases`	20	Strings with token counts across 3 tokenizer families.
`model-pricing-table`	20	LLM pricing - input/output cost per 1k tokens, context window.
`mcp-config-examples`	15	MCP client configs across Claude Desktop, Cursor, Cline, Windsurf, Zed.

Featured Projects

Karna - AI Agent Platform

Self-hosted AI assistant with 7 messaging channels (Telegram, Slack, Discord, WhatsApp, SMS, iMessage, Web), extensible plugin SDK, semantic memory, and voice. TypeScript monorepo with Next.js dashboard and React Native mobile app.

_{Stack · TypeScript • Node.js • Next.js • Supabase • WebSocket • pgvector}

Chetana - AI Consciousness Research Platform

Research-driven platform exploring machine consciousness through 14 indicators grounded in 6 scientific theories. Built to turn abstract AI-consciousness questions into structured experiments, scoring, and analysis.

_{Stack · AI Research • Evaluation • Experimentation • Python}

AgentRAG - Modular RAG Pipeline

Provider-agnostic RAG framework with pluggable vector stores, chunking strategies, and retrieval methods. Designed for agentic workflows with clean API boundaries.

_{Stack · RAG • Vector Search • Embeddings • TypeScript}

Astra Agent - AI Agent Runtime

Standalone AI agent runtime with tool execution, context management, and multi-model routing. Foundation for building autonomous AI assistants with structured tool use.

_{Stack · TypeScript • LLM Orchestration • Tool Use • Agents}

More Projects

Project	Description
Sadhak	AI-powered job search command center - automated evaluation, resume tailoring, application tracking
Chetana	AI consciousness research platform - 14 indicators from 6 scientific theories
Prithvi	Container security scanner - vulnerability detection, compliance checks, Docker audits
Amogha Cafe	Full-stack Firebase restaurant platform - real-time ordering, QR dine-in. Live
RNHT	Temple community platform - events, donations, priest scheduling
Patchly	AI code review bot - flags bugs, suggests fixes, explains why, like a senior engineer
Evalharness	Prompt, agent, and RAG test harness - red teaming, regression testing, CI/CD for AI
AgentMem	Pluggable memory management for AI agents
LLM Bench CLI	CLI for benchmarking local LLMs - speed, throughput, quality
TokenWise	Token usage optimization across providers

Impact at a Glance

Production AI / ML Impact
_{COST EFFICIENCY} 78% _{infrastructure cost reduction SageMaker → Bedrock migration}	_LATENCY 600x _{retrieval latency improvement ML prediction system}	_{RAG SCALE} 30K+ _{knowledge base entries 9-stage agentic RAG pipeline}	_QUALITY 370+ _{unit tests & evaluations production ML systems}
Open Source Footprint
_UPSTREAM 140 _{merged PRs across 106 external public repos}	_PACKAGES 200+ _{95 npm (47 *-mcp servers + 48 libs) + 53 PyPI + 20 in the official MCP Registry + 31 crates.io Rust crates + 7 GitHub Marketplace Actions + 17 HF Spaces + 16 HF Datasets + 1 Kaggle Dataset + 1 Homebrew tap + 1 GHCR image + 1 OSF project + 4 Codeberg mirrors + 4 GitLab mirrors}	_{ORIGINAL WORK} 322 _{original public repos maintained on GitHub}	_ECOSYSTEMS 6+ _{major org ecosystems OpenAI, Anthropic, Google, Microsoft, Stanford, Princeton}

What I Build

 ML Systems           Fault prediction, embedding pipelines, model evaluation, cost-optimized inference
 Agentic AI           RAG pipelines, LangGraph workflows, query routing, hallucination detection
 Cloud Infrastructure AWS (Bedrock, SageMaker, ECS, OpenSearch), GCP, Azure, Kubernetes, Terraform
 Full-Stack           React/TypeScript + Java/Python backend APIs, CI/CD, zero-downtime deployments

Experience

Role	Company	Era	Primary arena
AI/ML Engineer	Southwest Airlines	Aug 2025 - Present	production ML, agentic RAG, Bedrock migration
AI/ML Engineer	GPS IT Solutions	Jun 2024 - Aug 2025	RAG platforms, model-risk governance, vector search
Software Development Engineer	Amazon Web Services	Aug 2022 - May 2024	enterprise cloud systems, React/Java/Python, CI/CD
Data Engineer	GPS IT Solutions	Jan 2022 - Aug 2022	data pipelines, AWS Glue, PySpark, analytics workflows
Software Engineer	American Express	Feb 2017 - Dec 2020	Python backend services, REST APIs, enterprise platforms

Highlights

Southwest Airlines - AI/ML Engineer

Architected ML fault prediction system for aircraft maintenance - 5 prediction types, 10K+ records, sub-second retrieval
Led SageMaker → Bedrock migration: 78% cost reduction ($1,740→$371/mo), 600x latency improvement
Designed 9-stage agentic RAG pipeline (LangGraph, Bedrock Nova Pro/Micro, FAISS + BM25) over 30K+ KB entries

GPS IT Solutions - AI/ML Engineer

Built GPT-4 + RAG content generation platform with compliance validation, reducing production time by 40%
Designed AI model risk governance framework with 23 automated evaluation tests achieving regulatory compliance
Architected FastAPI microservices with FAISS/Pinecone vector search on Kubernetes

Amazon Web Services (AWS) - Software Development Engineer

Built and shipped features for AWS Application Manager (Systems Manager) serving enterprise customers globally
Owned full-stack delivery: React/TypeScript frontend + Java/Python backend APIs with operational excellence
Designed CI/CD and IaC patterns enabling zero-downtime deployments at enterprise scale

GPS IT Solutions - Data Engineer

Led end-to-end migration of data pipelines from on-prem to AWS (Glue, PySpark)

American Express - Software Engineer

Developed Python backend services and RESTful APIs for enterprise platforms handling high-volume transactions at scale

Follow For

If you follow my work here, you’ll mostly see:

open-source contributions to AI SDKs and agent tooling
MCP, eval, and developer-experience improvements
practical full-stack and infrastructure-heavy AI projects
systems thinking around memory, retrieval, orchestration, and production reliability

Education

University of Central Missouri - M.S. in Big Data Analytics and Information Technology (Jan 2021 - May 2022)

SRM University - B.Tech in Mechanical Engineering (2012 - 2016)

Certifications

Anthropic

AWS

Cloud & Infrastructure

Stanford / Wharton

Microsoft

LinkedIn Learning

Tech Stack

GitHub Stats

Live Signals

Open to opportunities - Senior AI/ML Engineer • GenAI Platform Engineer • Software Engineer

mukunda-ai.vercel.app • Las Vegas, NV

Provide feedback

Saved searches

Use saved searches to filter your results more quickly