Vectorless RAG with Hierarchical Document Trees

Retrieve without embeddings. Parse PDFs into a hierarchical document tree, then let a LangGraph agent reason through the structure to find answers with page-level citations. Zero vector database costs, zero chunking artifacts.

Start learning at learnwithparam.com. Regional pricing available with discounts of up to 60%.

What You'll Learn

Build a layout-aware PDF parser that produces a hierarchical TreeNode structure
Replace vector similarity search with LLM reasoning over a document tree
Design a LangGraph state machine that analyzes, routes, retrieves, and generates
Cache document trees to disk so repeated questions skip expensive parsing
Return grounded answers with page ranges and section titles as citations

Tech Stack

Python 3.11+ with uv for dependency management
LangGraph for agent state graphs and conditional routing
PyMuPDF and pymupdf4llm for layout-aware PDF parsing
OpenAI (or any OpenAI-compatible client) for reasoning calls
Pydantic for typed state validation
Docker for reproducible runs

Getting Started

Prerequisites

Python 3.11+
uv (installed automatically by make setup)
An OpenAI API key

Quick Start

# One command to set up and run
make dev

# Or step by step:
make setup          # Create .env and install dependencies
# Edit .env with your API key
make run            # Parse the PDF, build the tree, answer the sample questions

With Docker

make build          # Build the Docker image
make up             # Run the container
make logs           # View logs
make down           # Stop the container

How It Runs

Downloads the Google Bigtable paper on first run
Parses the PDF into a DocumentTree and caches it at results/document_tree.json
Renders the LangGraph workflow as results/workflow.png
Answers the questions in questions.py, printing reasoning, confidence, path, and sources

Edit questions.py to ask your own questions, or point main.py at a different PDF to index a new document.

Challenges

Work through these incrementally to build the full system:

PDF to Tree - Convert a raw PDF into a TreeNode hierarchy using PyMuPDF4LLM
Tree Caching - Serialize the tree to JSON and reload it on subsequent runs
Section Summaries - Generate a short summary per node so the agent can judge relevance cheaply
Analyze Node - LLM call that scores how relevant the current node is to the query
Conditional Routing - Decide between descending, retrieving, or backtracking based on confidence
Retrieve and Generate - Collect the full text of selected nodes and synthesize the final answer
Workflow Visualization - Render the state graph to a PNG for debugging
Multi-Document Trees - Extend the system to search across many documents in one tree

Makefile Targets

make help           Show all available commands
make setup          Initial setup (create .env, install deps)
make dev            Setup and run (one command!)
make run            Run the vectorless RAG pipeline
make build          Build Docker image
make up             Start container
make down           Stop container
make logs           View container logs
make clean          Remove venv, caches, and generated results

Learn more

Start the course: learnwithparam.com/courses/vectorless-rag
AI Bootcamp for Software Engineers: learnwithparam.com/ai-bootcamp
All courses: learnwithparam.com/courses

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
bigtable-osdi06.pdf		bigtable-osdi06.pdf
docker-compose.yml		docker-compose.yml
main.py		main.py
pyproject.toml		pyproject.toml
questions.py		questions.py
retriever.py		retriever.py
tree.py		tree.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vectorless RAG with Hierarchical Document Trees

What You'll Learn

Tech Stack

Getting Started

Prerequisites

Quick Start

With Docker

How It Runs

Challenges

Makefile Targets

Learn more

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vectorless RAG with Hierarchical Document Trees

What You'll Learn

Tech Stack

Getting Started

Prerequisites

Quick Start

With Docker

How It Runs

Challenges

Makefile Targets

Learn more

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages