DocChat (Production RAG Pipeline)

Upload .txt or .pdf documents, ask questions in plain English.

Live Demo → basic-rag-mstghx3dbkhezdategemlv.streamlit.app

Evaluation Results (RAGAS)

Evaluated on a QA dataset over the included articles (python evaluate.py):

Metric	Score	What it means
Faithfulness	0.92	Answers grounded in retrieved context
Answer Relevancy	0.88	Answers address the question asked
Context Precision	0.85	Retrieved chunks are on-topic
Context Recall	0.81	Necessary chunks were retrieved

Run python evaluate.py to reproduce with your own documents.

Architecture

User Question
      │
      ▼
┌─────────────────────────────────────────┐
│       Streamlit UI  /  CLI (main.py)    │
└──────────────────┬──────────────────────┘
                   │
      ┌────────────┴────────────┐
      ▼                         ▼
┌──────────────┐      ┌──────────────────────┐
│ data_loader  │      │    vector_store      │
│              │      │                      │
│ .txt + .pdf  ├─────▶│  OpenAI Embeddings   │
│ LangChain    │chunks│ ChromaDB (persisted) │
│ TextSplitter │      │  Similarity search   │
└──────────────┘      └──────────┬───────────┘
                                 │ top-k chunks
                                 ▼
                      ┌──────────────────────┐
                      │       rag.py         │
                      │                      │
                      │ LangChain LCEL chain │
                      │  GPT-4o-mini         │
                      │  Source citations    │
                      └──────────┬───────────┘
                                 │
                                 ▼
                      ┌──────────────────────┐
                      │     RAGResponse      │
                      │  answer: str         │
                      │  sources: list[str]  │
                      │ num_chunks_used: int │
                      └──────────────────────┘

Quick Start

1. Clone and install

git clone https://github.com/Babarali2k21/basic-rag.git
cd basic-rag

python -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate

pip install -r requirements.txt

2. Configure

cp .env.example .env
# Edit .env and set OPENAI_API_KEY=sk-your-key-here

3. Add your documents

Drop .txt or .pdf files into the ./articles/ folder. A sample article is included to get you started.

4a. Web UI (recommended)

streamlit run app.py

Open http://localhost:8501, enter your API key in the sidebar, upload documents or use the articles folder, then start asking questions.

4b. CLI

# Index documents
python main.py --index

# Interactive Q&A session
python main.py

# Single question
python main.py --query "What is retrieval-augmented generation?"

5. Run evaluation (optional)

python evaluate.py

Project Structure

basic-rag/
├── app.py              # Streamlit web UI with document upload
├── main.py             # CLI (--index / --query / interactive)
├── config.py           # Settings via Pydantic BaseSettings
├── data_loader.py      # Load .txt/.pdf + LangChain text splitting
├── vector_store.py     # ChromaDB indexing, loading, retriever factory
├── rag.py              # LangChain LCEL chain + RAGResponse dataclass
├── evaluate.py         # RAGAS evaluation script
├── conftest.py         # pytest path configuration
├── articles/           # Drop knowledge base files here
│   └── rag_overview.txt
├── docs/
│   └── screenshot.png  # App screenshot
├── tests/
│   └── test_rag.py     # Unit + integration tests (mocked, no API key needed)
├── .env.example        # Environment variable template
├── requirements.txt
└── .github/
    └── workflows/
        └── tests.yml   # CI: pytest + codecov on Python 3.11 & 3.12

Configuration

All settings via .env file:

Variable	Default	Description
`OPENAI_API_KEY`	—	Required
`OPENAI_MODEL`	`gpt-4o-mini`	LLM for answer generation
`EMBEDDING_MODEL`	`text-embedding-3-small`	Embedding model
`CHUNK_SIZE`	`512`	Characters per chunk
`CHUNK_OVERLAP`	`64`	Overlap between chunks
`RETRIEVAL_K`	`5`	Chunks to retrieve per query
`CHROMA_PATH`	`chroma_persistent_storage`	ChromaDB persistence path
`ARTICLES_DIR`	`./articles`	Knowledge base directory

Tests

pytest tests/ -v

# With coverage report
pytest tests/ -v --cov=. --cov-report=term-missing

All tests use mocked LLM calls no API key required to run the test suite.

12/12 tests passing on Python 3.11 and 3.12.

Roadmap

Tech Stack

Layer	Tool
LLM	OpenAI GPT-4o-mini
Embeddings	OpenAI text-embedding-3-small
RAG Framework	LangChain 0.2
Vector Store	ChromaDB
Evaluation	RAGAS
Web UI	Streamlit
Config	Pydantic BaseSettings
Testing	pytest + pytest-cov (12 tests)
CI/CD	GitHub Actions + Codecov
Deployment	Streamlit Cloud

Author

Babar Ali AI Engineer · Vienna, Austria

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocChat (Production RAG Pipeline)

Evaluation Results (RAGAS)

Architecture

Quick Start

1. Clone and install

2. Configure

3. Add your documents

4a. Web UI (recommended)

4b. CLI

5. Run evaluation (optional)

Project Structure

Configuration

Tests

Roadmap

Tech Stack

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
articles		articles
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
app.py		app.py
config.py		config.py
conftest.py		conftest.py
data_loader.py		data_loader.py
evaluate.py		evaluate.py
main.py		main.py
rag.py		rag.py
requirements.txt		requirements.txt
vector_store.py		vector_store.py

Folders and files

Latest commit

History

Repository files navigation

DocChat (Production RAG Pipeline)

Evaluation Results (RAGAS)

Architecture

Quick Start

1. Clone and install

2. Configure

3. Add your documents

4a. Web UI (recommended)

4b. CLI

5. Run evaluation (optional)

Project Structure

Configuration

Tests

Roadmap

Tech Stack

Author

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages