A Lean 4 library for formal linguistics.
⚠️ This is an experiment in "AI for Linguistics" using recent advances in proof assistants. Please let us know if you identify any inaccuracies.
Decades of progress in formal linguistics live in prose scattered across hundreds of papers. Linglib is an attempt to gather the machinery in one place so a proof assistant can do the bookkeeping:
-
Detect breakage. If you change your semantics for attitude verbs, Lean tells you exactly which downstream theorems about conditionals, questions, or pragmatic inference no longer follow. No more discovering an inconsistency from a reviewer.
-
Check predictions. Theories are often stated in notation ambiguous enough to hide gaps between what is claimed and what actually follows from the definitions. Lean won't let a proof go through unless the prediction genuinely follows from the theory.
-
Compare theories. When two theories (RSA vs. exhaustification, Kratzer vs. Kripke modals) both claim to handle the same data, we can formally characterize where they agree and where they diverge — rather than arguing past each other with different formalisms.
Linglib separates phenomena (what we observe) from theories (what explains it).
Phenomena/ contains theory-neutral empirical data — acceptability judgments, experimental results, distributional patterns. Theories/ contains formal theories that make predictions about those phenomena. The connection between them is explicit: theories prove theorems that reference the data.
Other top-level directories: Core/ (shared infrastructure like propositions, intensions, accessibility relations), Fragments/ (lexical data for specific languages), Comparisons/ (cross-theory results).
lake exe cache get # Get mathlib cache
lake buildhttps://hawkrobe.github.io/linglib/
Apache 2.0