Skip to content

Wire v1.37 tokenization config (textAnalyzer, stopwordPresets) through public types#429

Open
g-despot wants to merge 4 commits intomainfrom
tokenization-updates
Open

Wire v1.37 tokenization config (textAnalyzer, stopwordPresets) through public types#429
g-despot wants to merge 4 commits intomainfrom
tokenization-updates

Conversation

@g-despot
Copy link
Copy Markdown
Contributor

@g-despot g-despot commented Apr 30, 2026

Summary

Brings the TS client to parity with the python client for Weaviate v1.37 tokenization config. Pre-patch, users had to fall back to as any for per-property textAnalyzer, invertedIndex.stopwordPresets, and the /v1/tokenize stopwords / stopwordPresets fields.

Public surface:

  • TextAnalyzerConfig — new type used for both per-property textAnalyzer and tokenize.text({ analyzerConfig }). Ergonomic union: asciiFold: boolean | { ignore: string[] }.
  • InvertedIndexConfig.stopwordPresets — exposed on create / read / update, plus on the configure.invertedIndex(...) and reconfigure.invertedIndex(...) builders.
  • tokenize.text — now accepts stopwords (one-off block) and stopwordPresets (named catalog). Mutually exclusive — passing both rejects client-side with WeaviateInvalidInputError. Version-gated at >= 1.37.2.

Schema: tools/refresh_schema.sh v1.37.2 refreshed src/openapi/schema.ts so TokenizeRequest carries stopwords (top-level) and the flat stopwordPresets shape. CI matrix bumped to 1.37.2.

Test plan

  • WEAVIATE_VERSION=1.37.2 npm run test:unit — 323/323 pass
  • npm run build / npm run lint — clean
  • Integration tests against live Weaviate 1.37.2:
    • test/tokenize/integration.test.ts — covers analyzerConfig, stopwords (preset+additions / additions-only / removals-only), stopwordPresets (named ref / builtin override), mutex rejection. Inputs/outputs match the python integration suite.
    • test/collections/tokenization/integration.test.ts — round-trips textAnalyzer and stopwordPresets through collection.config.get().

🤖 Generated with Claude Code

@g-despot g-despot requested a review from a team as a code owner April 30, 2026 11:41
Copy link
Copy Markdown

@orca-security-eu orca-security-eu Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orca Security Scan Summary

Status Check Issues by priority
Passed Passed Infrastructure as Code high 0   medium 0   low 0   info 0 View in Orca
Passed Passed SAST high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Secrets high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Vulnerabilities high 0   medium 0   low 0   info 0 View in Orca

CI prettier flagged whitespace inside empty `() => { }` arrow bodies.
Strip to `() => {}` to match repo style.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the public TypeScript client types and (de)serialization to expose Weaviate v1.37’s per-property text-analysis configuration (textAnalyzer) and collection-level invertedIndex.stopwordPresets, and ensures the tokenize endpoint uses the same shared translation logic.

Changes:

  • Exposes TextAnalyzerConfig and wires it through collection property create/read types, with shared union↔wire translation helpers.
  • Exposes InvertedIndexConfig.stopwordPresets on schema create/read surfaces and maps it through config deserialization.
  • Updates tokenize endpoint typing/docs and CI matrix to target Weaviate 1.37.2, plus adds unit + integration coverage for round-tripping.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
test/collections/tokenization/integration.test.ts Adds integration coverage for schema-config round-tripping of textAnalyzer and stopwordPresets.
src/tokenize/index.ts Switches tokenize analyzerConfig serialization to the shared translator and updates stopword preset typing.
src/collections/tokenization/unit.test.ts Adds type-level tests pinning the public tokenization surface across schema refreshes.
src/collections/configure/types/base.ts Wires textAnalyzer and stopwordPresets into public “configure/create/update” types.
src/collections/config/utils.ts Introduces shared textAnalyzerConfigToWire / textAnalyzerConfigFromWire and plugs into schema create + config.get mapping.
src/collections/config/types/index.ts Adds public TextAnalyzerConfig and exposes stopwordPresets + PropertyConfig.textAnalyzer.
.github/workflows/main.yaml Updates CI matrix Weaviate 1.37 entry to 1.37.2.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/collections/config/types/index.ts Outdated
Comment thread src/collections/config/utils.ts Outdated
Comment thread src/tokenize/index.ts
Comment thread test/collections/tokenization/integration.test.ts
Comment thread src/collections/config/utils.ts Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Comment thread src/tokenize/index.ts
Comment thread src/collections/config/utils.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants