feat(datafusion): tantivy-datafusion connector for SQL over tantivy splits by alexanderbianchi · Pull Request #1 · alexanderbianchi/quickwit

alexanderbianchi · 2026-04-11T04:01:48Z

Summary

Adds TantivyDataSource implementing QuickwitDataSource for SQL queries over tantivy-indexed log data
Full-text search via full_text() UDF pushed down to tantivy's inverted index
Fast field filter pushdown to tantivy range queries
Tantivy native aggregation pushdown (AggDataSource)
Distributed execution with TantivyCodec for plan serialization across workers
Async document retrieval via Searcher::doc_async() (per-block, no full-file preload)
REST POST /api/v1/_sql endpoint with pretty-printed table output
8 unit tests + 5 full cluster sandbox integration tests

Architecture

SQL query → DataFusionService::execute_sql
  → QuickwitSchemaProvider resolves index
  → TantivyDataSource::create_default_table_provider (opens 1 split for schema)
  → TantivyTableProvider::scan (lists all splits, creates QuickwitSplitOpener per split)
  → SingleTableProvider::from_splits (partition-per-segment parallelism)
  → DistributedExec (splits work across searcher nodes)
  → AggDataSource (native tantivy aggregation when applicable)

Depends on

feat(datafusion): DataFusion metrics query layer quickwit-oss/quickwit#6276 (bianchi/df-4-serve)
tantivy-datafusion (path dependency)

Test plan

cargo test -p quickwit-datafusion -- tantivy::tests (8 unit tests)
cargo test -p quickwit-integration-tests -- tantivy_datafusion_tests (5 integration tests)
Manual: distributed SQL queries with 4 searcher nodes
Manual: full-text search + document retrieval + aggregations via REST endpoint
Load test with hundreds of splits

🤖 Generated with Claude Code

Documents all changes made to tantivy-datafusion during the quickwit integration, upstream tantivy API requests, and architectural improvements needed in tantivy-datafusion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

alexanderbianchi · 2026-04-13T02:10:43Z

Working on some stuff locally - as of writing this we partition by segment which is information local to the split but splits should not be opened until the worker node. Splits are like fragments - and this architecture right now is using the Tantivy library as the "reader" node (basically:bolt). So what i'm heading for now is effectively the "logs-event-store-api" or "quickwit" level being distributed over splits and tantivy in the table provider does the "bolt" / "logs-event-store-reader" work. Long term - sure we could do a two step planning and push datafusion closer to the data further decomposing tantivy but I'm not sure we need that at first.

alexanderbianchi · 2026-04-13T03:56:44Z

full text search broken on latest commit.

alexanderbianchi · 2026-04-13T04:17:02Z

Next up: clean up some of the IndexOpener stuff - start looking into quickwit API behaviors like pagination or deterministic ordering and see if we can put it in the plan. Inter-split stuff still done in tantivy.

fulmicoton · 2026-04-14T16:25:51Z

+
+**What changed**: Removed sync document store reads from the `spawn_blocking`
+batch generation path entirely. Document retrieval now happens asynchronously
+after each batch exits `spawn_blocking`:


I don't think there is any need for spawn_blocking.

Prefer using one of the thread pool in quickwit_common.

spawn_blocking is really just for blocking IO.

alexanderbianchi and others added 2 commits April 11, 2026 00:57

quickwit tantivy datafusion

b2e4a60

scary diff - primarily planning behavior changes and formatting

db39b08

fix tests

917a633

fulmicoton reviewed Apr 14, 2026

View reviewed changes

alexanderbianchi added 2 commits April 14, 2026 17:24

reflect changes in tantivy-datafusion

577fabb

rayon pool no more blocking

f914f00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(datafusion): tantivy-datafusion connector for SQL over tantivy splits#1

feat(datafusion): tantivy-datafusion connector for SQL over tantivy splits#1
alexanderbianchi wants to merge 6 commits intobianchi/df-4-servefrom
bianchi/quickwit-tantivy-datafusion

alexanderbianchi commented Apr 11, 2026

Uh oh!

alexanderbianchi commented Apr 13, 2026

Uh oh!

alexanderbianchi commented Apr 13, 2026

Uh oh!

alexanderbianchi commented Apr 13, 2026

Uh oh!

fulmicoton Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alexanderbianchi commented Apr 11, 2026

Summary

Architecture

Depends on

Test plan

Uh oh!

alexanderbianchi commented Apr 13, 2026

Uh oh!

alexanderbianchi commented Apr 13, 2026

Uh oh!

alexanderbianchi commented Apr 13, 2026

Uh oh!

fulmicoton Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants