Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

laurus-python

PyPI License: MIT

Python bindings for the Laurus search engine. Provides lexical search, vector search, and hybrid search from Python via a native Rust extension built with PyO3 and Maturin.

Features

  • Lexical Search -- Full-text search powered by an inverted index with BM25 scoring
  • Vector Search -- Approximate nearest neighbor (ANN) search using Flat, HNSW, or IVF indexes
  • Hybrid Search -- Combine lexical and vector results with fusion algorithms (RRF, WeightedSum)
  • Rich Query DSL -- Term, Phrase, Fuzzy, Wildcard, NumericRange, Geo, Boolean, Span queries
  • Text Analysis -- Tokenizers, filters, stemmers, and synonym expansion
  • Flexible Storage -- In-memory (ephemeral) or file-based (persistent) indexes
  • Pythonic API -- Clean, intuitive Python classes with full type information

Installation

pip install laurus

To build from source (requires Rust toolchain):

pip install maturin
maturin develop

Quick Start

import laurus

# Create an in-memory index
index = laurus.Index()

# Index documents
index.put_document("doc1", {"title": "Introduction to Rust", "body": "Systems programming language."})
index.put_document("doc2", {"title": "Python for Data Science", "body": "Data analysis with Python."})
index.commit()

# Search with a DSL string
results = index.search("title:rust", limit=5)
for r in results:
    print(f"[{r.id}] score={r.score:.4f}  {r.document['title']}")

# Search with a query object
results = index.search(laurus.TermQuery("body", "python"), limit=5)

Index Types

In-memory (ephemeral)

index = laurus.Index()

File-based (persistent)

schema = laurus.Schema()
schema.add_text_field("title")
schema.add_text_field("body")
schema.add_hnsw_field("embedding", dimension=384)

index = laurus.Index(path="./myindex", schema=schema)

Query Types

Query class Description
TermQuery(field, term) Exact term match
PhraseQuery(field, [terms]) Ordered phrase match
FuzzyQuery(field, term, max_edits) Approximate term match
WildcardQuery(field, pattern) Wildcard pattern match (*, ?)
NumericRangeQuery(field, min, max) Numeric range (int or float)
GeoQuery(field, lat, lon, radius_km) Geo-distance radius search
BooleanQuery(must, should, must_not) Compound boolean logic
SpanNearQuery(field, [terms], slop) Proximity / ordered span match
VectorQuery(field, vector) Pre-computed vector similarity
VectorTextQuery(field, text) Text-to-vector similarity (requires embedder)

Hybrid Search

request = laurus.SearchRequest(
    lexical_query=laurus.TermQuery("body", "rust"),
    vector_query=laurus.VectorQuery("embedding", query_vec),
    fusion=laurus.RRF(k=60.0),
    limit=10,
)
results = index.search(request)

Fusion algorithms

Class Description
RRF(k=60.0) Reciprocal Rank Fusion (rank-based, default for hybrid)
WeightedSum(lexical_weight=0.5, vector_weight=0.5) Score-normalised weighted sum

Text Analysis

syn_dict = laurus.SynonymDictionary()
syn_dict.add_synonym_group(["ml", "machine learning"])

tokenizer = laurus.WhitespaceTokenizer()
filt = laurus.SynonymGraphFilter(syn_dict, keep_original=True, boost=0.8)

tokens = tokenizer.tokenize("ml tutorial")
tokens = filt.apply(tokens)
for tok in tokens:
    print(tok.text, tok.position, tok.boost)

Examples

Usage examples are in the examples/ directory:

Example Description
quickstart.py Basic indexing and full-text search
lexical_search.py All query types (Term, Phrase, Boolean, Fuzzy, Wildcard, Range, Geo, Span)
vector_search.py Semantic similarity search with embeddings
hybrid_search.py Combining lexical and vector search with fusion
synonym_graph_filter.py Synonym expansion in the analysis pipeline
search_with_openai.py Cloud-based embeddings via OpenAI
multimodal_search.py Text-to-image and image-to-image search

Documentation

License

This project is licensed under the MIT License - see the LICENSE file for details.