-
Notifications
You must be signed in to change notification settings - Fork 160
Docs: VuePress plugin to strip [V<n>] citation markers and §Sources footers at build #1679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
marcin-kordas-hoc
wants to merge
8
commits into
develop
Choose a base branch
from
feat/vuepress-strip-citation-markers
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+706
−0
Open
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
238c475
Docs: VuePress plugin to strip [V<n>] citation markers and §Sources f…
marcin-kordas-hoc f6f0282
Docs: VuePress strip plugin — preserve markdown-it-footnote tokens pa…
marcin-kordas-hoc 4f0d2e7
Docs: VuePress strip plugin — kill mutation-surviving gaps + lock in …
marcin-kordas-hoc 410a8a2
Docs: VuePress strip plugin — normalize Sources heading before patter…
marcin-kordas-hoc 01a3e9b
Fix: use plain code fence instead of unsupported \`text\` language in…
marcin-kordas-hoc 3e76890
fix(docs): strip current lowercase audit-harness markers, not only le…
marcin-kordas-hoc c133c43
fix(docs): use unique §AuditSources footer token so legitimate Source…
marcin-kordas-hoc 084c293
test(docs): require both heading and body removed in Sources footer a…
marcin-kordas-hoc File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,175 @@ | ||
| /** | ||
| * markdown-it plugin: strip internal audit-harness annotations from rendered docs. | ||
| * | ||
| * Our internal authoring workflow uses the audit-harness convention: | ||
| * - Inline citation markers like `[vrf_1]`, `[dec_3]` (legacy `[V12]`) placed | ||
| * next to factual claims. | ||
| * - A trailing `§AuditSources` footer listing the sources. | ||
| * | ||
| * These markers exist so the audit-harness can re-verify every claim against | ||
| * its source before content is shipped. They are NEVER meant to be seen by | ||
| * end users. When any spec or note ends up published as docs, we strip them | ||
| * at build time so the rendered site stays clean. | ||
| * | ||
| * Stripping rules: | ||
| * - Inline marker: `[<prefix>_<digits>]` (e.g. `[vrf_1]`) or legacy | ||
| * `[V<digits>]`, NOT followed by `(` (so real markdown | ||
| * links `[vrf_1](url)` / `[V12](url)` are left untouched). | ||
| * - Footer section: a heading whose text is exactly `§AuditSources` | ||
| * (a unique token, so legitimate `Sources` headings are | ||
| * never clobbered), together with everything below it up | ||
| * to end-of-file or the next top-level (`#`) heading. | ||
| * - Fenced/inline code is left alone, so pages that document the | ||
| * audit-harness itself can still render the markers verbatim. | ||
| * | ||
| * Implementation: walks the markdown-it token stream after parsing. | ||
| */ | ||
|
|
||
| // Audit-harness citation markers. The current convention (post-2026-05-21) is a | ||
| // lowercase prefix + `_` + digits — `[vrf_1]`, `[dec_3]`, `[con_2]`, `[que_5]`, | ||
| // `[wrg_7]`, `[crf_4]` — matching the parser grammar `^\[[a-z][a-z0-9_]*\]$`. | ||
| // The legacy uppercase `[V<n>]` form is kept as an alternative for older notes. | ||
| // In both cases a trailing `(` is excluded so real markdown links like | ||
| // `[vrf_1](url)` / `[V12](url)` are left untouched. | ||
| const INLINE_CITATION_PATTERN = /\[(?:V\d+|[a-z][a-z0-9]*_\d+)\](?!\()/g; | ||
| // Footer marker. Deliberately a unique token (`§AuditSources`) rather than a | ||
| // bare `Sources` heading, so that legitimate docs sections titled "Sources" | ||
| // are never clobbered. The `§` is required. | ||
| const SOURCES_HEADING_PATTERN = /^\s*§\s*AuditSources\s*$/i; | ||
|
|
||
| /** | ||
| * Removes inline `[V<n>]` markers from a string of text. | ||
| * | ||
| * @param {string} text - Raw text content from a markdown token. | ||
| * @returns {string} Text with citation markers removed and surrounding | ||
| * whitespace normalized. | ||
| */ | ||
| const stripInlineMarkers = (text) => | ||
| text | ||
| .replace(INLINE_CITATION_PATTERN, '') | ||
| // collapse stray double spaces left behind by removal | ||
| .replace(/[ \t]{2,}/g, ' ') | ||
| // tidy " ." / " ," / " ;" / " :" / " )" | ||
| .replace(/ ([.,;:!?\)])/g, '$1'); | ||
|
|
||
| /** | ||
| * Recursively strips inline markers from children of an inline token. | ||
| * | ||
| * @param {Array} children - markdown-it inline children tokens. | ||
| */ | ||
| const stripChildren = (children) => { | ||
| if (!Array.isArray(children)) return; | ||
| children.forEach((child) => { | ||
| if (child.type === 'text' && typeof child.content === 'string') { | ||
| child.content = stripInlineMarkers(child.content); | ||
| } | ||
| if (child.children) { | ||
| stripChildren(child.children); | ||
| } | ||
| }); | ||
| }; | ||
|
|
||
| /** | ||
| * Detects whether a heading_open token (already located) introduces the | ||
| * `Sources` / `§Sources` footer. The heading's raw inline content is first | ||
| * normalized via `stripInlineMarkers` so that an authored heading like | ||
| * `§ Sources [V1]` (markers next to the heading text) still matches the | ||
| * strict end-anchored pattern; without normalization the trailing `[V1]` | ||
| * would defeat the `\s*$` anchor and the footer would never be detected. | ||
| * | ||
| * @param {Array} tokens - Full token array. | ||
| * @param {number} headingOpenIdx - Index of the heading_open token. | ||
| * @returns {boolean} True when the heading text matches the Sources footer. | ||
| */ | ||
| const isSourcesHeading = (tokens, headingOpenIdx) => { | ||
| const inline = tokens[headingOpenIdx + 1]; | ||
| if (!inline || inline.type !== 'inline') return false; | ||
| return SOURCES_HEADING_PATTERN.test(stripInlineMarkers(inline.content || '')); | ||
| }; | ||
|
|
||
| /** | ||
| * Returns the index after which the Sources footer ends. The footer extends | ||
| * from the Sources heading up to (but not including) the FIRST of: | ||
| * - the next top-level (`h1`) heading_open token, or | ||
| * - any `footnote_*` token (markdown-it-footnote appends `footnote_block` | ||
| * and friends at the END of the stream; they belong to the page body, | ||
| * not to the footer), or | ||
| * - end-of-stream. | ||
| * | ||
| * @param {Array} tokens - Full token array. | ||
| * @param {number} startIdx - Index of the Sources heading_open token. | ||
| * @returns {number} Exclusive end index of the footer. | ||
| */ | ||
| const findFooterEnd = (tokens, startIdx) => { | ||
| for (let i = startIdx + 1; i < tokens.length; i += 1) { | ||
| const t = tokens[i]; | ||
| if (t.type === 'heading_open' && t.tag === 'h1') { | ||
| return i; | ||
| } | ||
| if (typeof t.type === 'string' && t.type.startsWith('footnote_')) { | ||
| return i; | ||
| } | ||
| } | ||
| return tokens.length; | ||
| }; | ||
|
|
||
| /** | ||
| * Mutates the token array in place to remove the Sources footer (heading + | ||
| * everything below) and apply inline marker stripping to every text token. | ||
| * | ||
| * Footnote invariant: markdown-it-footnote (registered in `config.js`) | ||
| * appends `footnote_block` / `footnote_anchor` / `footnote_open` / | ||
| * `footnote_close` / `footnote_ref` tokens at the END of the token stream. | ||
| * The footer splice stops before any such token so footnotes on pages that | ||
| * also carry a `§ Sources` footer are not silently swallowed. | ||
| * | ||
| * @param {Array} tokens - markdown-it token array. | ||
| * @returns {Array} The same token array (for chaining). | ||
| */ | ||
| const transformTokens = (tokens) => { | ||
| // 1. Find a `Sources` heading and drop everything from it onward | ||
| // (up to the next h1, if any). | ||
| for (let i = 0; i < tokens.length; i += 1) { | ||
| const t = tokens[i]; | ||
| if (t.type === 'heading_open' && isSourcesHeading(tokens, i)) { | ||
| const end = findFooterEnd(tokens, i); | ||
| tokens.splice(i, end - i); | ||
| i -= 1; | ||
| } | ||
| } | ||
|
cursor[bot] marked this conversation as resolved.
|
||
|
|
||
| // 2. Strip `[V<n>]` markers from every remaining inline text token. | ||
| // Code tokens (`code_inline`, `code_block`, `fence`) are skipped so | ||
| // docs that illustrate the audit-harness syntax keep working. | ||
| tokens.forEach((token) => { | ||
| if (token.type === 'inline' && token.children) { | ||
| stripChildren(token.children); | ||
| } | ||
| }); | ||
|
|
||
| return tokens; | ||
| }; | ||
|
|
||
| /** | ||
| * markdown-it plugin entry point. Hooks into the core ruler so transforms | ||
| * run after parsing but before rendering. | ||
| * | ||
| * @param {object} md - markdown-it instance supplied by VuePress. | ||
| */ | ||
| const stripCitationMarkers = (md) => { | ||
| // Insert before `replacements` so that VuePress's heading-anchor logic | ||
| // (which runs later and slugifies heading text) also sees the cleaned | ||
| // text. Falls back to push() if the anchor rule cannot be located. | ||
| const insert = (state) => { | ||
| transformTokens(state.tokens); | ||
| }; | ||
| try { | ||
| md.core.ruler.before('replacements', 'strip-citation-markers', insert); | ||
| } catch (e) { | ||
| md.core.ruler.push('strip-citation-markers', insert); | ||
| } | ||
| }; | ||
|
|
||
| module.exports = stripCitationMarkers; | ||
| module.exports.transformTokens = transformTokens; | ||
| module.exports.stripInlineMarkers = stripInlineMarkers; | ||
28 changes: 28 additions & 0 deletions
28
docs/.vuepress/plugins/strip-citation-markers/test-fixture.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| # Sample page | ||
|
|
||
| This sentence has a citation marker [V1] right after a word, and another one [V42]. | ||
|
|
||
| Real markdown links such as [V12](https://example.com/v12) MUST remain intact because they are not bare citation markers. | ||
|
|
||
| A line with multiple markers [V3] [V4] should collapse trailing whitespace cleanly. | ||
|
|
||
| Inline code like `[V99]` must NOT be stripped because authors may need to discuss the audit-harness syntax itself. | ||
|
|
||
| ``` | ||
| fenced code [V7] stays as-is | ||
| ``` | ||
|
|
||
| ## A subsection [V8] | ||
|
|
||
| Body of a subsection [V9]. | ||
|
|
||
| ## §AuditSources | ||
|
|
||
| - [V1] https://example.com/source-1 | ||
| - [V3] https://example.com/source-3 | ||
| - [V4] https://example.com/source-4 | ||
| - [V8] https://example.com/source-8 | ||
| - [V9] https://example.com/source-9 | ||
| - [V42] https://example.com/source-42 | ||
|
|
||
| Trailing footer content that must also be removed. |
160 changes: 160 additions & 0 deletions
160
docs/.vuepress/plugins/strip-citation-markers/test-plugin-order.js
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,160 @@ | ||
| /** | ||
| * Plugin-order regression test for `strip-citation-markers`. | ||
| * | ||
| * Background: the strip plugin's footer splice intentionally stops before | ||
| * any `footnote_*` token because `markdown-it-footnote` appends those at | ||
| * the END of the token stream — they belong to the page body, not to the | ||
| * `§ Sources` footer. | ||
| * | ||
| * The wiring contract in `docs/.vuepress/config.js` is: | ||
| * | ||
| * md.use(footnotePlugin) // registers footnote_tail | ||
| * md.use(includeCodeSnippet) | ||
| * md.use(stripCitationMarkers) // splices §Sources footer | ||
| * | ||
| * The ACTUAL ordering that makes footnotes survive is determined by where | ||
| * each plugin hooks into `core.ruler`: | ||
| * - `markdown-it-footnote`: `core.ruler.after('inline', 'footnote_tail')` | ||
| * - `strip-citation-markers`: `core.ruler.before('replacements', ...)` | ||
| * | ||
| * Because `inline` comes before `replacements` in markdown-it's default | ||
| * core rule chain, `footnote_tail` always runs before our strip rule — as | ||
| * long as BOTH plugins are registered. If a future refactor: | ||
| * (a) removes `markdown-it-footnote` (no footnote tokens ever exist), or | ||
| * (b) registers it in a way that moves `footnote_tail` AFTER our hook, | ||
| * then footnotes on any page with a `§ Sources` footer will be silently | ||
| * swallowed by the splice. | ||
| * | ||
| * This test demonstrates both halves of the contract: | ||
| * | ||
| * 1. NEGATIVE CONTROL: build a markdown-it instance that DOES NOT carry | ||
| * `markdown-it-footnote`. Feed it a page with `[^note]` syntax + a | ||
| * `§ Sources` footer. The `[^note]` literal text appears BEFORE the | ||
| * `§ Sources` heading so it survives the splice — but no footnote | ||
| * anchor/section is produced (because no footnote plugin is loaded). | ||
| * This anchors the "footnote_tail must be registered upstream" half | ||
| * of the contract. | ||
| * | ||
| * 2. POSITIVE CONTROL: same source, plugins registered in the SAME order | ||
| * as `config.js`. Footnote anchor + body + section all survive AND | ||
| * the `§ Sources` footer body is stripped AND inline `[V<n>]` markers | ||
| * are stripped. This is the contract `config.js` relies on. | ||
| * | ||
| * If either of these assertions ever flips, the strip plugin and the | ||
| * VuePress config are out of sync and footnotes will break in customer | ||
| * docs. | ||
| * | ||
| * Run with: `node docs/.vuepress/plugins/strip-citation-markers/test-plugin-order.js` | ||
| */ | ||
|
|
||
| const MarkdownIt = require('markdown-it'); | ||
| const footnotePlugin = require('markdown-it-footnote'); | ||
| const stripCitationMarkers = require('./index'); | ||
|
|
||
| const failures = []; | ||
|
|
||
| const assert = (cond, message) => { | ||
| if (!cond) failures.push(message); | ||
| }; | ||
|
|
||
| const source = [ | ||
| '# Footnote-aware page', | ||
| '', | ||
| 'Body text with a footnote ref.[^note] [V5]', | ||
| '', | ||
| '[^note]: Footnote body content.', | ||
| '', | ||
| '## §AuditSources', | ||
| '', | ||
| '- [V5] https://example.com/source-5', | ||
| '- Trailing footer entry that must be stripped.', | ||
| '', | ||
| ].join('\n'); | ||
|
|
||
| const hasFootnoteAnchor = (html) => | ||
| /class="footnote-ref"|class="footnotes"|<section[^>]*footnotes/i.test(html); | ||
| const hasFootnoteBody = (html) => /Footnote body content/.test(html); | ||
|
|
||
| // --- 1. NEGATIVE CONTROL: no markdown-it-footnote installed. | ||
| // The `[^note]` reference is just literal text; no `footnote_*` tokens | ||
| // are ever generated; the strip plugin behaves correctly on body text | ||
| // (markers stripped, §Sources footer dropped) but there is no footnote | ||
| // anchor/section in the output. This locks in the assumption that | ||
| // footnote tokens come from a SEPARATE plugin — if someone replaces | ||
| // `markdown-it-footnote` with a different mechanism, this test fails | ||
| // and forces a review of `findFooterEnd`'s footnote check. | ||
| const mdNoFootnote = new MarkdownIt({ html: true }); | ||
| mdNoFootnote.use(stripCitationMarkers); | ||
| const noFootnoteHtml = mdNoFootnote.render(source); | ||
|
|
||
| assert( | ||
| !hasFootnoteAnchor(noFootnoteHtml), | ||
| 'Negative control: without markdown-it-footnote, no footnote anchor/section should appear. If this fires, the strip plugin or markdown-it core gained an unexpected footnote rule and the wiring assumption changed.' | ||
| ); | ||
| assert( | ||
| !/Trailing footer entry/.test(noFootnoteHtml), | ||
| 'Negative control: §Sources footer body must still be stripped even without footnote plugin' | ||
| ); | ||
| assert( | ||
| !/\[V5\]/.test(noFootnoteHtml.replace(/<code[\s\S]*?<\/code>/g, '')), | ||
| 'Negative control: inline [V<n>] markers must still be stripped even without footnote plugin' | ||
| ); | ||
|
|
||
| // --- 2. POSITIVE CONTROL: plugins registered in the same order as | ||
| // `config.js`: footnote FIRST, strip LAST. This is the contract. | ||
| const mdConfig = new MarkdownIt({ html: true }); | ||
| mdConfig.use(footnotePlugin); | ||
| mdConfig.use(stripCitationMarkers); | ||
| const configOrderHtml = mdConfig.render(source); | ||
|
|
||
| assert( | ||
| hasFootnoteAnchor(configOrderHtml), | ||
| 'Positive control: config-order (footnote BEFORE strip) must render the footnote anchor/section' | ||
| ); | ||
| assert( | ||
| hasFootnoteBody(configOrderHtml), | ||
| 'Positive control: config-order must render the footnote body content' | ||
| ); | ||
| assert( | ||
| !/Trailing footer entry/.test(configOrderHtml), | ||
| 'Positive control: config-order must still strip the §Sources footer body' | ||
| ); | ||
| assert( | ||
| !/\[V5\]/.test(configOrderHtml.replace(/<code[\s\S]*?<\/code>/g, '')), | ||
| 'Positive control: config-order must still strip inline [V<n>] markers' | ||
| ); | ||
|
|
||
| // --- 3. RULE-CHAIN INVARIANT: assert that `footnote_tail` runs BEFORE the | ||
| // strip plugin's rule in the resulting `core.ruler` chain. This is the | ||
| // PRIMITIVE mechanism that makes the wiring work. If a future | ||
| // markdown-it-footnote version moves `footnote_tail` to a different | ||
| // ruler position, this assertion fires and points engineers at the | ||
| // root cause directly. | ||
| const ruleNames = mdConfig.core.ruler.__rules__.map((r) => r.name); | ||
| const footnoteIdx = ruleNames.indexOf('footnote_tail'); | ||
| const stripIdx = ruleNames.indexOf('strip-citation-markers'); | ||
| assert( | ||
| footnoteIdx !== -1, | ||
| 'Rule-chain invariant: expected `footnote_tail` rule to be registered by markdown-it-footnote' | ||
| ); | ||
| assert( | ||
| stripIdx !== -1, | ||
| 'Rule-chain invariant: expected `strip-citation-markers` rule to be registered' | ||
| ); | ||
| assert( | ||
| footnoteIdx < stripIdx, | ||
| 'Rule-chain invariant: expected `footnote_tail` to run BEFORE `strip-citation-markers` so footnote tokens exist when the splice runs (got footnote_tail=' + | ||
| footnoteIdx + ', strip=' + stripIdx + ')' | ||
| ); | ||
|
|
||
| if (failures.length > 0) { | ||
| console.error('FAIL strip-citation-markers/test-plugin-order'); | ||
| failures.forEach((f) => console.error(' - ' + f)); | ||
| console.error('\n--- no-footnote rendered output ---\n' + noFootnoteHtml); | ||
| console.error('\n--- config-order rendered output ---\n' + configOrderHtml); | ||
| process.exit(1); | ||
| } | ||
|
|
||
| console.log( | ||
| 'PASS strip-citation-markers/test-plugin-order (10 assertions: 3 negative + 4 positive + 3 rule-chain)' | ||
| ); |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.