Minimal Flask RAG app for uploading a document, indexing it locally, and asking questions with citations.
- Upload and index a single source (PDF/DOCX/TXT/MD/CSV/JSON/HTML/XML).
- Hybrid retrieval: dense embeddings (Gemini) + BM25, fused with RRF.
- Top matches view plus grounded answers with numbered citations.
- Evidence preview: click a citation or a match card to open the referenced chunk.
- Ingestion progress UI (upload, extract, chunk, embed, index).
- Optional: use Qdrant for the dense vector index.
flowchart TD
UI[Browser UI] -->|upload/query| API[Flask app]
API -->|/api/upload| X[Extract text]
X --> C[Chunker]
C --> E[Gemini embed]
E --> V[(Dense vector index: Qdrant or in-memory)]
C --> B[(In-memory BM25)]
API -->|/api/retrieve| R[Hybrid retrieve + RRF fuse]
V --> R
B --> R
API -->|/api/answer| L["Gemini chat (grounded answer)"]
R --> L
L --> UI
Note: architecture.md may be out of date; this README reflects the current code paths.
Copy .env.example to .env. Key settings:
GOOGLE_API_KEY(required)MAX_UPLOAD_MB(default250)CHUNK_MAX_TOKENS(default512)CHUNK_OVERLAP_TOKENS(default32)GEMINI_EMBED_MODEL(defaulttext-embedding-004)GEMINI_CHAT_MODEL(defaultgemini-2.5-flash)MAX_EMBED_REQUESTS_PER_MINUTE(default0= disabled; limits embedding API requests/minute)FLASK_HOST(default0.0.0.0)FLASK_PORT(default5000)FLASK_DEBUG(default0)QDRANT_URL(optional, e.g.http://localhost:6333)QDRANT_COLLECTION(optional, defaultrag_chunks)QDRANT_API_KEY(optional)QDRANT_RECREATE_COLLECTION(optional, default1)
Query tuning defaults live in webapp/config.py (QuerySettings).
- Upload + extract text (PDF/DOCX/TXT/etc).
- Chunk + embed (Gemini), build a dense index (Qdrant or in-memory) + in-memory BM25.
- Query uses hybrid retrieval + RRF fusion, then the LLM answers with chunk citations (shown in the UI).
Important: the app still keeps chunk text/ordering in process memory for the active upload. Qdrant currently replaces only the dense similarity search component.
- Create a virtualenv and install deps:
python3 -m venv .venvsource .venv/bin/activatepip install -r requirements.txt
- Set your Gemini key:
cp .env.example .env- edit
.envand setGOOGLE_API_KEY
- (Optional) Start Qdrant locally:
docker run -d --name rag-qdrant -p 6333:6333 -v "$(pwd)/qdrant_storage:/qdrant/storage" qdrant/qdrant- set
QDRANT_URL=http://localhost:6333in.env
- Start the server:
python3 app.py
- Open:
http://localhost:5000