Skip to content

Themis Database System - High-performance C++ hybrid-database (graph-vector-relational-file) with AQL support and MVCC

License

Notifications You must be signed in to change notification settings

makr-code/ThemisDB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ—„οΈ ThemisDB

High-Performance Multi-Model Database with Native AI/LLM Integration

"ThemisDB keeps its own llamas." – Run LLaMA, Mistral, Phi-3 directly in your database, no API calls needed.

CI Code Quality Coverage Version License


πŸŽ‰ What's New in v1.3.0

🧠 Native LLM Integration with llama.cpp (Optional)

"ThemisDB keeps its own llamas." – Run AI/LLM workloads directly in your database - no external API costs!

[!NOTE] LLM integration is an optional feature that requires: -# LLM Features (When Enabled)

Feature Description Status
🧠 Embedded LLM Engine llama.cpp integration for LLaMA/Mistral/Phi-3 (1B-70B params) βœ…
πŸ–ΌοΈ Image Analysis AI Multi-backend plugins (llama.cpp Vision, ONNX CLIP, OpenCV DNN) βœ…
⚑ GPU Acceleration NVIDIA CUDA support with significant speedup βœ…
πŸ’Ύ PagedAttention Advanced memory management βœ…
🎯 Continuous Batching Handle concurrent inference requests βœ…
πŸ”§ Quantization Q4_K_M, Q5_K_M, Q8_0 for efficient memory usage βœ…
πŸ“Š Monitoring Grafana dashboards with metrics and alerts βœ…
πŸ”Œ Plugin Architecture Extensible LLM and image analysis backends βœ…
🌐 Distributed RPC Inter-shard communication for distributed LLM ops βœ…

Performance Highlights

[!TIP] GPU acceleration provides significant speedup over CPU with PagedAttention memory savings.

  • ⚑ Significant speedup with GPU acceleration vs CPU
  • πŸ’Ύ Memory savings with PagedAttention and prefix caching
  • πŸš€ Kernel fusion for additional performance gains
  • βœ… Comprehensive test coverage with unit tests

πŸ“š Documentation:

ThemisDB is a production-ready multi-model database that combines relational, graph, vector, and document models in a single system with full ACID transaction support. Built on RocksDB with advanced security and compliance features.

πŸ“¦ Editions

Available Editions
Edition License Features
πŸ†“ Community Open Source (MIT) Full-featured single-node database with all core capabilities
πŸ”’ Enterprise Commercial + Horizontal scaling, advanced analytics, HA/replication, and more

β†’ See Enterprise Edition Details


✨ Features

πŸ”‘ Core Features

Database Capabilities
Feature Description Community Enterprise
πŸš€ Quick Start

🐳 Docker (Recommended)

# Pull and run the latest version
docker pull themisdb/themisdb:latest

# Run with Docker
docker run -d \
  --name themis \
  -p 8080:8080 \
  -p 18765:18765 \
  -p 4318:4318 \
  -v themis_data:/data \
  themisdb/themisdb:latest

# Or use Docker Compose
docker compose up -d

# Verify installation
curl http://localhost:8080/health

[!TIP] Use Docker Compose for production deployments with proper configuration.

πŸ“‘ Default Ports

Port Protocol Description
8080 HTTP/1.1 REST API, GraphQL
18765 Binary Wire Protocol, gRPC
4318 HTTP OpenTelemetry/Prometheus

[!NOTE] Complete Port Reference: See [v1.3.0+)

  • βœ… Image Analysis - Multi-backend AI plugins (v1.3.0+)
  • βœ… GNN Embeddings - Graph Neural Network support
🌐 Modern Protocols
Protocol Status Description
HTTP/1.1 βœ… REST API, GraphQL
HTTP/2 βœ… Server Push for CDC
HTTP/3 🚧 QUIC (experimental)
WebSocket βœ… Bidirectional streaming
gRPC βœ… Binary RPC
MQTT βœ… IoT messaging
PostgreSQL Wire βœ… BI tool compatibility
MCP βœ… Model Context Protocol
SSE βœ… Server-Sent Events
πŸ“š Transparency & Attribution

ThemisDB is built on proven open-source foundations with clear attribution:

  • βœ… Transparent Attribution - Clear documentation of all dependencies
  • βœ… Innovation Documentation - ThemisDB's unique contributions vs third-party features
  • βœ… License Compliance - Full license information for all components

β†’ See Complete Attribution

**Key Features:**
  • πŸ”’ ACID Transactions - Full snapshot isolation with MVCC
  • πŸ” Multi-Model - Relational, Graph, Vector, Document in one database
  • πŸš€ High Performance - 45K writes/s, 120K reads/s, GPU-accelerated vector search
  • πŸ›‘οΈ Security - TLS 1.3, RBAC, field-level encryption, audit logging (Enterprise: HSM integration)
  • πŸ“Š Analytics - Time-series, aggregations (Enterprise: OLAP, CEP, materialized views)
  • 🌐 Distribution - Single-node optimized (Enterprise: horizontal sharding, replication, Kubernetes)
  • 🧠 AI-Ready - Hybrid search (RAG), embedding cache, FAISS integration, optional LLM engine with llama.cpp (v1.3.0+), image analysis AI plugins (v1.3.0+)
  • 🌐 Modern Protocols - HTTP/1.1, GraphQL, SSE, gRPC (v1.3.0), HTTP/2 with Server Push βœ…, WebSocket βœ…, MQTT βœ…, HTTP/3 🚧, PostgreSQL Wire βœ…, MCP βœ…
  • πŸ“š Transparent Attribution - Clear documentation of third-party dependencies vs ThemisDB innovations (see ATTRIBUTIONS.md)
  • πŸ–ΌοΈ Image Analysis - Multi-backend AI plugin architecture (llama.cpp Vision, ONNX CLIP, OpenCV DNN)

Quick Start

Docker (Recommended)

# Pull and run the latest version
docker pull themisdb/themisdb:latest
docker run -d \
  -p 8080:8080 \
  -p 18765:18765 \
  -p 4318:4318 \
  -v themis_data:/data \
  themisdb/themisdb:latest

# Or use Docker Compose
docker compose up -d

# Check health
curl http://localhost:8080/health

Default Ports:

  • 8080 - HTTP/REST API, GraphQL
  • 18765 - Binary Wire Protocol, gRPC
  • 4318 - OpenTelemetry/Prometheus metrics

πŸ“– Complete Port Reference: See docs/deployment/PORT_REFERENCE.md for all ports including optional protocols (MQTT, PostgreSQL Wire, MCP).

From Source

# Clone repository
git clone https://github.com/makr-code/ThemisDB.git
cd ThemisDB

# Setup and build (Linux/macOS)
./scripts/setup.sh
./scripts/build.sh

# Setup and build (Windows)
.\scripts\setup.ps1
.\scripts\build.ps1

# Start server
./build/themis_server --config config.yaml

Optional Protocol Support (Security: Opt-In by Default):

# Enable HTTP/2 with Server Push (explicit opt-in for security)
cmake -B build -S . -DTHEMIS_ENABLE_HTTP2=ON

# Enable WebSocket with CDC (explicit opt-in for security)
cmake -B build -S . -DTHEMIS_ENABLE_WEBSOCKET=ON

# Enable MQTT broker (explicit opt-in for security)
cmake -B build -S . -DTHEMIS_ENABLE_MQTT=ON

# Enable PostgreSQL Wire Protocol (explicit opt-in for security)
cmake -B build -S . -DTHEMIS_ENABLE_POSTGRES_WIRE=ON

# Enable MCP for LLM integration (explicit opt-in for security)
cmake -B build -S . -DTHEMIS_ENABLE_MCP=ON

# Enable HTTP/3 (explicit opt-in for security)
cmake -B build -S . -DTHEMIS_ENABLE_HTTP3=ON

# Default build only includes HTTP/1.1, GraphQL, SSE, gRPC (minimal attack surface)

See Protocol Documentation for details.

Windows: Build mit LLM (llama.cpp) - Optional

# OPTIONAL: FΓΌr LLM-UnterstΓΌtzung - lokaler Clone von llama.cpp erforderlich
if (!(Test-Path "llama.cpp")) {
  git clone https://github.com/ggerganov/llama.cpp.git llama.cpp
}

# MSVC Release-Build mit LLM-UnterstΓΌtzung
powershell -File scripts/build-themis-server-llm.ps1

# Sanity-Check
./build-msvc/bin/themis_server.exe --help

Hinweise:

  • LLM-UnterstΓΌtzung ist optional und erfordert -DTHEMIS_ENABLE_LLM=ON beim Build
  • llama.cpp/ liegt als lokaler Clone im Projekt-Root und ist per .gitignore und .dockerignore ausgeschlossen (wird nicht committed oder in Docker kopiert)
  • Der Build-Skript setzt Visual Studio 2022 (-G "Visual Studio 17 2022") und -A x64, bindet die vcpkg-Toolchain ein und behebt MSVC‑spezifische char8_t‑Fehler am llama‑Target

β†’ Comprehensive Build Documentation | Build-Varianten, Plattformen, Troubleshooting

Package Managers

Linux (Debian/Ubuntu):

wget https://github.com/makr-code/ThemisDB/releases/latest/download/themisdb_1.3.0-1_amd64.deb
sudo apt install ./themisdb_1.3.0-1_amd64.deb
sudo systemctl start themisdb

macOS (Homebrew):

brew install themisdb
brew services start themisdb

Windows (Chocolatey):

choco install themisdb

5-Minute Tutorial

# 1. Check server health
curl http://localhost:8765/health

# 2. Create an entity
curl -X PUT http://localhost:8765/entities/users:alice \
  -H "Content-Type: application/json" \
  -d '{"blob":"{\"name\":\"Alice\",\"age\":30,\"city\":\"Berlin\"}"}'

# 3. Create an index
curl -X POST http://localhost:8765/index/create \
  -H "Content-Type: application/json" \
  -d '{"table":"users","column":"city"}'

# 4. Query by index
curl -X POST http://localhost:8765/query \
  -H "Content-Type: application/json" \
  -d '{"table":"users","predicates":[{"column":"city","value":"Berlin"}],"return":"entities"}'

# 5. View metrics
curl http://localhost:8765/metrics

Architecture

ThemisDB uses a unified storage architecture with specialized projection layers:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   Query Layer (AQL)                     β”‚
β”‚  SQL-like β€’ Graph Traversals β€’ Vector Search β€’ Analyticsβ”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                 Projection Layers                        β”‚
β”‚  Secondary Indices β€’ Graph Adjacency β€’ HNSW Vector      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚              Canonical Storage (Base Entity)             β”‚
β”‚         RocksDB LSM-Tree β€’ MVCC Transactions            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Core Components:

  • Storage Engine: RocksDB TransactionDB with LSM-Tree
  • Transaction Manager: MVCC with snapshot isolation
  • Query Engine: Advanced Query Language (AQL) with graph/vector support
  • Index Manager: Automatic maintenance of secondary, graph, and vector indexes
  • Security: TLS 1.3, RBAC, field encryption, audit logging
  • Observability: Prometheus metrics, OpenTelemetry tracing

β†’ Full Architecture Documentation


Core Features

Multi-Model Database

  • Relational: SQL-like queries with secondary indexes
  • Graph: BFS, Dijkstra, A* traversals with path constraints
  • Vector: HNSW and FAISS for similarity search (GPU-accelerated)
  • Document: JSON storage with flexible schema
  • Time-Series: Gorilla compression, continuous aggregates

Transaction Support

  • Full ACID guarantees with snapshot isolation
  • Write-write conflict detection
  • Atomic updates across all index types
  • Session-based and direct API

Advanced Analytics

  • CEP Engine: Complex Event Processing with pattern matching
  • OLAP: CUBE, ROLLUP, window functions
  • Time-Series: Compression, retention policies, aggregates
  • Hybrid Search: BM25 + vector for RAG workflows

Enterprise Security

  • TLS 1.3 with mTLS support
  • Role-Based Access Control (RBAC)
  • Field-level encryption
  • Audit logging with SIEM integration
  • Certificate pinning for HSM/TSA
  • Secrets management (HashiCorp Vault)

Distributed Capabilities

  • Horizontal sharding with consistent hashing
  • Leader-follower and multi-master replication
  • RAID-like redundancy (MIRROR, STRIPE, PARITY)
  • Kubernetes operator with CRDs
  • Auto-rebalancing and cloud deployment

GPU Acceleration (Optional)

  • 10 backend options: CUDA, Vulkan, HIP, OpenCL, DirectX, OneAPI, ZLUDA
  • 10-50x speedup for vector search
  • Automatic platform detection and fallback

Documentation

Getting Started:

Core Concepts:

Features:

Operations:

Development:

πŸ“š Documentation

Getting Started
Core Concepts
Features
Operations
Development
Enterprise & Strategy

[!NOTE] Full Documentation: https://makr-code.github.io/ThemisDB/


πŸ—ΊοΈ Roadmap

βœ… Completed (v1.0 - v1.3)

Production-Ready Features
  • βœ… ACID transactions with MVCC
  • βœ… Multi-model support (relational, graph, vector, document)
  • βœ… Horizontal sharding and replication
  • βœ… GPU acceleration (10 backends)
  • βœ… Enterprise security features
  • βœ… Client SDKs (7 languages)
  • βœ… Kubernetes operator
  • βœ… Native LLM integration (optional)
  • βœ… Modern protocol support (HTTP/2, WebSocket, gRPC, MQTT, PostgreSQL Wire, MCP)

🚧 In Progress (v1.4 - Q1 2026)

  • 🚧 Query Optimizer - Advanced query optimization and execution plans
  • 🚧 Multi-Datacenter - Cross-region deployment support
  • 🚧 Advanced ML/GNN - Enhanced machine learning features
  • 🚧 Production Hardening - Additional stability and performance improvements

πŸ“‹ Planned (v1.5+ - 2026)

  • πŸ“‹ Modular Architecture - Split monolithic core into 11 focused libraries
  • πŸ“‹ Real-Time Views - Materialized views with automatic updates
  • πŸ“‹ Cross-Region Replication - Global data distribution
  • πŸ“‹ Advanced Compliance - SOC 2, HIPAA certification
  • πŸ“‹ Cloud-Native Optimizations - Enhanced cloud provider integrations

πŸ“š Detailed Planning:


⚑ Performance

Benchmark Results

Test Environment: Release build, Windows x64, 20 cores @ 3696 MHz

Operation Throughput Latency (avg) Notes
πŸ“ Entity PUT 45,000 ops/s 0.02 ms Write throughput
πŸ“– Entity GET 120,000 ops/s 0.008 ms Read throughput
πŸ” Indexed Query 3.4M queries/s 0.29 ΞΌs AQL WHERE clause
πŸ•ΈοΈ Graph Traverse 9.56M ops/s 0.105 ΞΌs BFS (depth=3)
🎯 Vector Search (RGB) 59.7M queries/s 0.017 μs Simple 3D vectors
πŸ“Š Vector Insert (384D) 411k vectors/s 2.44 ΞΌs Typical embeddings
🧠 RAG Search (Top-50) 7.17M queries/s 0.14 μs LLM retrieval

[!IMPORTANT] Performance Disclaimer: Benchmarks represent optimal conditions. Actual performance varies based on:

  • Hardware configuration (CPU, RAM, storage)
  • Data size and complexity
  • Concurrent workload patterns
  • Build configuration and optimizations

πŸ“Š Detailed Analysis:

Resource Description Link
πŸ“š Documentation Complete guides and API reference Docs Site
πŸ› Issues Report bugs or request features GitHub Issues
πŸ’¬ Discussions Community Q&A and discussions GitHub Discussions
🀝 Contributing How to contribute to ThemisDB Contributing Guide
πŸ”’ Security Responsible disclosure policy Security Policy

πŸ“„ License

License Information

Community Edition

ThemisDB Community Edition is released under the MIT License.

  • βœ… Free to use, modify, and distribute
  • βœ… Commercial use allowed
  • βœ… Full feature set for single-node deployments

Enterprise Edition

ThemisDB Enterprise Edition features (horizontal sharding, advanced analytics, HA/replication, etc.) are available under a commercial license.

Enterprise Inquiries: [email protected]

β†’ See Enterprise Features


πŸ™ Acknowledgments

ThemisDB builds upon and is inspired by these excellent projects:

Inspirations & Foundations
Project Influence Area
ArangoDB Multi-model architecture Design Philosophy
CozoDB Hybrid relational-graph-vector Data Models
Azure Cosmos DB Multi-model with unified API API Design
RocksDB High-performance LSM-Tree storage Storage Engine
FAISS Efficient similarity search Vector Search

[!NOTE] For a complete list of third-party libraries and detailed feature attributions, see ATTRIBUTIONS.md.


Built with ❀️ for the database community

⭐ Star us on GitHub Β· πŸ“– Read the Docs Β· 🀝 Contribute

es and feature attributions, see [ATTRIBUTIONS.md](ATTRIBUTIONS.md).**

Built with ❀️ for the database community