Skip to content

DocuChat is a document-based chatbot that leverages advanced NLP models to provide intelligent responses based on the content of uploaded documents. This project consists of a PHP frontend and a Python backend.

Notifications You must be signed in to change notification settings

LebToki/DocuChat

Repository files navigation

πŸš€ DocuChat - AI-Powered Document Chat System

DocuChat Python PHP License

Transform your documents into an intelligent conversation partner

Features β€’ Installation β€’ Usage β€’ Contributing


✨ Overview

DocuChat is a cutting-edge Retrieval-Augmented Generation (RAG) system that transforms static documents into interactive, AI-powered knowledge bases. Upload your documents, ask questions, and get intelligent answers powered by advanced NLP models.

Whether you're a researcher analyzing papers, a student studying materials, or a professional managing documentation, DocuChat makes document interaction seamless and intelligent.


🌟 Features

🎨 Modern UI/UX

  • Beautiful Dark Theme - Eye-friendly dark mode with modern gradients
  • Responsive Design - Works perfectly on desktop, tablet, and mobile
  • Smooth Animations - Polished transitions and micro-interactions
  • Intuitive Navigation - Clean, user-friendly interface

πŸ’¬ Enhanced Chat Interface

  • Real-time Chat - Interactive conversation with your documents
  • Typing Indicators - Visual feedback while AI processes your query
  • Message History - View and manage your conversation history
  • Copy to Clipboard - One-click copy for any message
  • Export Conversations - Download chat history as text files
  • Markdown Support - Rich text formatting in responses

πŸ“€ Advanced File Upload

  • Drag & Drop - Intuitive file upload with drag-and-drop support
  • File Preview - See file details before uploading
  • Progress Tracking - Real-time upload progress indicators
  • Multiple Formats - Support for PDF, DOCX, PPTX, XLSX, TXT
  • File Type Icons - Visual file type identification

πŸ“ Project Management

  • Organize Documents - Group files into projects
  • Search Functionality - Quickly find projects and files
  • Project Statistics - View file counts and project details
  • Bulk Operations - Manage multiple files efficiently
  • Quick Actions - Generate embeddings and fine-tune models with one click

πŸ€– AI Capabilities

  • Semantic Search - Find relevant information using embeddings
  • BERT-based Models - Advanced NLP for document understanding
  • Fine-tuning Support - Customize models for your specific use case
  • Multi-language Support - Handle documents in multiple languages
  • Context-Aware Responses - Answers based on document content

πŸ”” User Experience Enhancements

  • Toast Notifications - Beautiful, non-intrusive notifications
  • Loading States - Clear feedback during operations
  • Error Handling - User-friendly error messages
  • Keyboard Shortcuts - Power user features
  • Empty States - Helpful guidance when no data exists

πŸ› οΈ Technology Stack

Backend

  • Python 3.11+ - Core language
  • Flask - Web framework
  • Transformers - Hugging Face models
  • FAISS - Vector similarity search
  • BERT - Multilingual language model
  • PyTorch - Deep learning framework

Frontend

  • PHP 7.4+ - Server-side scripting
  • Bootstrap 5 - UI framework
  • JavaScript (ES6+) - Interactive features
  • Font Awesome - Icons
  • jQuery - DOM manipulation

πŸ“¦ Installation

Prerequisites

  • Python 3.11 or higher
  • PHP 7.4 or higher
  • Web server (Apache/Nginx) or PHP built-in server
  • pip (Python package manager)
  • Composer (for PHP dependencies, if needed)

Step 1: Clone the Repository

git clone https://github.com/LebToki/DocuChat.git
cd DocuChat

Step 2: Backend Setup

# Navigate to backend directory
cd backend

# Create virtual environment
python -m venv .venv

# Activate virtual environment
# On Linux/Mac:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Download required models
python download_models.py

# Set environment variables
export SECRET_KEY="your-secret-key-here"
export BACKEND_URL="http://localhost:8080"
export DOCUCHAT_USER="admin"
export DOCUCHAT_PASS="password"

# Run the Flask backend
python app.py

The backend will start on http://localhost:8080

Step 3: Frontend Setup

# Navigate back to project root
cd ..

# Configure backend URL in config.php
# Edit public/src/views/config.php or set BACKEND_URL environment variable

# Using PHP built-in server
php -S localhost:8000 -t public

# Or configure with your web server (Apache/Nginx)
# Point document root to the 'public' directory

Step 4: Docker Setup (Alternative)

# Build and run with Docker Compose
docker-compose up --build

πŸš€ Usage

1. Create a Project

  1. Navigate to Manage Projects
  2. Enter a project name
  3. Click Create Project

2. Upload Documents

  1. Go to Upload Document
  2. Select your project
  3. Drag & drop or browse for files
  4. Supported formats: PDF, DOCX, PPTX, XLSX, TXT

3. Generate Embeddings

  1. In Manage Projects, select your project
  2. Click Generate Embeddings
  3. Wait for processing to complete

4. Chat with Documents

  1. Go to Chat with Document
  2. Select your project
  3. Type your question
  4. Get AI-powered answers!

5. Fine-tune Model (Optional)

  1. Select a project with documents
  2. Click Fine-Tune Model
  3. Wait for training to complete

πŸ“– API Endpoints

Projects

  • GET /projects - List all projects
  • POST /projects - Create a new project
  • DELETE /projects - Delete a project

Files

  • POST /upload - Upload a document
  • DELETE /projects/<project_name>/files - Delete a file

Embeddings

  • POST /projects/<project_name>/generate_embeddings - Generate embeddings

Chat

  • POST /ask - Ask a question about documents

Model

  • POST /fine_tune - Fine-tune the model

🎯 Use Cases

  • Research - Analyze academic papers and research documents
  • Education - Interactive study materials and Q&A
  • Business - Document knowledge bases and FAQs
  • Legal - Contract and legal document analysis
  • Technical - API documentation and technical guides
  • Personal - Organize and query personal documents

πŸ—οΈ Project Structure

DocuChat/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ app.py                 # Flask application
β”‚   β”œβ”€β”€ download_models.py      # Model download script
β”‚   β”œβ”€β”€ fine_tune_model.py     # Model fine-tuning script
β”‚   β”œβ”€β”€ models/                # Stored models
β”‚   β”œβ”€β”€ project_embeddings/    # Project embeddings
β”‚   β”œβ”€β”€ static/
β”‚   β”‚   └── uploads/           # Uploaded files
β”‚   └── requirements.txt       # Python dependencies
β”œβ”€β”€ public/
β”‚   β”œβ”€β”€ css/
β”‚   β”‚   └── styles.css         # Main stylesheet
β”‚   β”œβ”€β”€ js/
β”‚   β”‚   β”œβ”€β”€ scripts.js         # Main JavaScript
β”‚   β”‚   └── utils.js           # Utility functions
β”‚   β”œβ”€β”€ img/                   # Images and icons
β”‚   └── src/
β”‚       └── views/             # PHP views
β”œβ”€β”€ docker-compose.yml         # Docker configuration
β”œβ”€β”€ Dockerfile.backend         # Backend Dockerfile
β”œβ”€β”€ Dockerfile.frontend        # Frontend Dockerfile
└── README.md                  # This file

πŸ”§ Configuration

Environment Variables

# Backend
SECRET_KEY=your-secret-key-here
BACKEND_URL=http://localhost:8080
DOCUCHAT_USER=admin
DOCUCHAT_PASS=password
ALLOWED_ORIGINS=*

# Frontend (in config.php)
BACKEND_URL=http://localhost:8080

🀝 Contributing

We welcome contributions! Here's how you can help:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Contribution Guidelines

  • Follow the existing code style
  • Add comments for complex logic
  • Update documentation as needed
  • Write clear commit messages
  • Test your changes thoroughly

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

  • Hugging Face - For the Transformers library and models
  • Facebook AI Research - For FAISS
  • Bootstrap - For the UI framework
  • Font Awesome - For icons

πŸ“ž Support


🌟 Star History

If you find this project useful, please consider giving it a ⭐ on GitHub!


Made with ❀️ by the DocuChat Team

⬆ Back to Top

About

DocuChat is a document-based chatbot that leverages advanced NLP models to provide intelligent responses based on the content of uploaded documents. This project consists of a PHP frontend and a Python backend.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •