Build and chat with your private knowledge base using Retrieval-Augmented Generation.
Parrotly is a modular AI application that allows users to create a private knowledge base from their own documents and interact with it using natural language.
The system focuses on building a reliable Retrieval-Augmented Generation pipeline with hybrid retrieval, reranking, evaluation, and support for both cloud-based and local LLM providers.
Parrotly allows users to:
- Upload PDF and TXT documents
- Build a searchable private knowledge base
- Ask questions grounded in document context
- Generate document summaries
- Inspect retrieved sources used for answers
- Compare different retrieval configurations
- Monitor token usage, latency and estimated costs
- Switch between OpenAI and local models using Ollama
- PDF and TXT document ingestion
- Text splitting with metadata preservation
- Document parsing and preprocessing pipeline
- Dense retrieval using FAISS vector search and embeddings
- Sparse retrieval using TF-IDF keyword search
- Hybrid retrieval combining semantic similarity and keyword matching
- Post-retrieval reranking based on relevance scoring
- Cloud-based generation using OpenAI models
- Local model execution through Ollama
- Context-grounded response generation
- Token usage and cost tracking
- Automated retrieval evaluation pipeline
- Experiment comparison across retrieval configurations
- Token usage, cost and latency monitoring
A key part of Parrotly is an experiment-driven approach to improving retrieval quality.
Instead of relying on a single retrieval method, the system includes an evaluation framework for benchmarking different configurations and selecting the most effective setup.
The evaluation compares:
- Dense semantic retrieval using vector search
- Sparse retrieval using TF-IDF keyword search
- Hybrid retrieval combining both approaches
- Different retrieval parameters and Top-K configurations
Performance is measured using:
- Top-K Accuracy
- Hit Rate
- Mean Reciprocal Rank (MRR)
- Recall@K
- Retrieval latency
Hybrid retrieval achieved the best overall ranking performance by combining semantic understanding with exact keyword matching.
| Retrieval Strategy | Top-1 Accuracy | Top-5 Hit Rate | MRR | Recall@5 |
|---|---|---|---|---|
| Dense Search | 0.90 | 1.00 | 0.92 | 1.00 |
| TF-IDF Search | 0.80 | 1.00 | 0.85 | 1.00 |
| Hybrid Search | 0.90 | 1.00 | 0.95 | 1.00 |
The results showed that hybrid retrieval improved ranking quality while maintaining full source recall.
Detailed experiment outputs are exported automatically:
evaluation/results/retrieval_comparison.csv
evaluation/results/retrieval_details.json
- LangChain
- FAISS
- OpenAI API
- Ollama
- TF-IDF retrieval
- Python
- Streamlit
- Pydantic
- NumPy
- Pandas
- Scikit-learn
- Docker
- Docker Compose
git clone https://github.com/bjamiolkowski/modular-rag-assistant.git
cd modular-rag-assistantpython -m venv .venv
source .venv/bin/activateWindows:
.venv\Scripts\activatepip install -r requirements.txtCreate .env file:
OPENAI_API_KEY=your_openai_api_keyOptional local model configuration:
OLLAMA_MODEL=llama3streamlit run app.pyApplication will be available at:
http://localhost:8501
Build and run:
docker compose up --buildMIT License


