Releases · rahulsiiitm/videochain-python

This release marks a major milestone for my semester project: the official PyPI launch of VidChain v0.2.0.

What started as a video analysis script has been completely re-architected into a production-ready, edge-optimized multimodal RAG framework. This version introduces a new visual processing pipeline, stabilizes GPU memory management for consumer hardware (tested on RTX 3050), and debuts a highly intelligent conversational persona.

🔥 Academic & Engineering Highlights

Official PyPI Launch: The framework is now packaged and globally available via pip install VidChain.
Dual-Brain Vision Architecture: Moved away from a single classifier to a decoupled approach. YOLOv8s now extracts "Nouns" (objects/entities), while a custom fine-tuned MobileNetV3 extracts "Verbs" (action intents like NORMAL, SUSPICIOUS, VIOLENCE).
Smart OCR Integration: Integrated EasyOCR with context-aware triggers. The system only scans for text when readable surfaces (e.g., laptops, books, screens) are detected by YOLO, significantly reducing compute overhead.
Meet B.A.B.U.R.A.O.: Replaced the standard RAG log-reader with a conversational AI copilot named B.A.B.U.R.A.O. (Behavioral Analysis & Broadcasting Unit for Real-time Artificial Observation). It utilizes abductive reasoning to translate raw, flickering sensor data into natural, contextual narratives.

Edge-First Hardware Optimizations

Stable GPU Inference: Diagnosed and bypassed a critical mixed-precision (FP16/FP32) layer-fusion bug in PyTorch/Ultralytics, ensuring stable, native CUDA execution on consumer GPUs without memory crashes.
Semantic Chunking: The pipeline now compresses identical, consecutive visual frames in the timeline, preserving FAISS vector space and reducing LLM context-window bloat.
Dependency Stabilization: Pinned pillow < 12.0 to resolve MoviePy fatal errors and unpinned strict torch dependencies to allow seamless local CUDA environment setups.

Getting Started

# 1. Install the framework
pip install VidChain

# 2. Install hardware-accelerated PyTorch (CUDA 12.1 example)
pip install torch torchvision torchaudio --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121) --force-reinstall

# 3. Run analysis
vidchain-analyze your_video.mp4

VideoChain v0.1.0: Initial Multimodal Release

Overview

VideoChain is a lightweight, multimodal Retrieval-Augmented Generation (RAG) framework designed to transform unstructured video data into a searchable, structured knowledge base. This initial release establishes the core pipeline architecture for synchronizing visual entity detection with auditory transcriptions to enable high-fidelity reasoning via Local Large Language Models (LLMs).

Core Components and Features

1. Vision Engine (YOLOv8 Integration)

Implemented real-time object detection using the Ultralytics YOLOv8 architecture.
Features motion-gated sampling to optimize inference speed by processing only frames with significant temporal changes.
Provides automated entity counting and descriptive labeling for the knowledge base.

2. Audio Processing (Whisper & Librosa)

Integrated OpenAI Whisper for robust speech-to-text transcription.
Implemented acoustic feature extraction using Librosa to detect RMS energy spikes and volume fluctuations.
Supports FP16 inference for optimized performance on NVIDIA hardware (RTX 3050 and similar).

3. Temporal-Spatial Fusion Engine

Engineered a centralized fusion engine to synchronize vision and audio streams into a unified chronological timeline.
Outputs a standardized knowledge_base.json that serves as the ground-truth context for the RAG engine.
Resolved critical data-mapping issues related to timestamp synchronization and key-value consistency.

4. Retrieval-Augmented Generation (RAG)

Established a communication bridge with the Ollama inference server.
Default support for Llama 3.2, providing local, privacy-focused reasoning over the extracted video data.
Optimized prompt templates for security-focused analysis and general scene understanding.

Technical Infrastructure

Distribution: Full PEP 517 compliance for distribution via the Python Package Index (PyPI).
Dependency Management: Automated resolution for OpenCV, Torch, Ultralytics, and Whisper.
Dynamic Pathing: Implemented system-agnostic binary resolution for FFmpeg, ensuring cross-platform compatibility without hardcoded local paths.

Installation

pip install videochain

Requirements

Python 3.11 or higher
FFmpeg (must be available in system PATH)
Ollama (running Llama 3.2 or compatible models)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

🔥 Academic & Engineering Highlights

Edge-First Hardware Optimizations

Getting Started

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

VideoChain v0.1.0: Initial Multimodal Release

Overview

Core Components and Features

1. Vision Engine (YOLOv8 Integration)

2. Audio Processing (Whisper & Librosa)

3. Temporal-Spatial Fusion Engine

4. Retrieval-Augmented Generation (RAG)

Technical Infrastructure

Installation

Requirements

Uh oh!

Releases: rahulsiiitm/videochain-python

VidChain v0.2.0: The Dual-Brain & B.A.B.U.R.A.O. Update

🔥 Academic & Engineering Highlights

Edge-First Hardware Optimizations

Getting Started

Uh oh!

v0.1.0-alpha: Core Multimodal Pipeline

VideoChain v0.1.0: Initial Multimodal Release

Overview

Core Components and Features

1. Vision Engine (YOLOv8 Integration)

2. Audio Processing (Whisper & Librosa)

3. Temporal-Spatial Fusion Engine

4. Retrieval-Augmented Generation (RAG)

Technical Infrastructure

Installation

Requirements

Uh oh!