Zero-dependency text folding for search index and query normalization.
fold() produces a diacritic- and case-insensitive form of a string, applied
identically at index time and query time so that a search index never
diverges from the queries run against it (divergence = silent search misses).
import { fold } from '@lde/text-normalization';
fold('Møhlmann'); // 'mohlmann'
fold('Coöperatieve'); // 'cooperatieve'
fold('Straße'); // 'strasse'It combines Unicode NFKD decomposition + combining-mark stripping (which folds é, ö, å, ç, …) with an explicit transliteration map for letters that do not decompose under NFKD (ø, æ, œ, ß, ð, þ, ł, đ, …).
A search engine on its default locale often folds case and diacritics for you –
Typesense v30 (verified) even folds the non-decomposing ø/æ/ß – so on the
default locale fold() is redundant for search. It becomes necessary when:
- Sorting – engines sort strings by raw code-point order with no collation,
so a
fold()-ed companion field is the only way to sort case- and diacritic-insensitively. - Stemming – enabling a language’s stemmer requires a non-default
locale, which switches the tokenizer (Typesense → ICU) to one that preserves diacritics; the default folding is lost, andfold()restores diacritic-insensitive matching.
fold() is idempotent (fold(fold(x)) === fold(x)). Punctuation and word
boundaries are preserved; tokenization is left to the search engine.
Because folded values are stored in the search index, the same fold() must be
used at index time and query time, and any change to it requires a full rebuild.