Skip to content

Commit 55fd2c9

Browse files
timsaucerclaude
andauthored
docs: convert reStructuredText sources to MyST markdown (#1579)
* docs: convert restructuredText sources to MyST markdown Phase 2 of the documentation-site refresh. Run `rst2myst convert` over every human-authored .rst file under docs/source/ and remove the originals. The result: - 33 .rst files become 33 .md files (user guide, contributor guide, index, links). - Headings, paragraphs, hyperlinks, code blocks, admonitions, and toctree directives all map cleanly to MyST syntax. - Cross-reference anchors round-trip through MyST as `(label)=` blocks. The converter kebab-cased the labels (e.g. `(io-csv)=`), but every `{ref}` target in the corpus still uses the underscore form from the original RST (`{ref}\`CSV <io_csv>\``) and so do the Python docstrings that AutoAPI pulls in. Rewrite the anchors back to the underscore form so the existing references resolve. - 86 `{eval-rst}` blocks remain — they all wrap `.. ipython::` directives, which have no first-class MyST equivalent. They render identically and don't block the build. conf.py changes: - Enable `colon_fence` and `deflist` MyST extensions (rst-to-myst emits these on a few files, particularly execution-metrics.md). - Keep `.rst` in `source_suffix` even though no human-authored RST remains: sphinx-autoapi generates RST under autoapi/ at build time and Sphinx needs the suffix registered to parse it. AGENTS.md: update the two .rst paths called out under "Aggregate and Window Function Documentation" to point at the .md equivalents. Verified by building locally — `build succeeded`, no warnings, all internal cross-references resolve, the ipython examples on the landing page and basics page still execute. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: fix Apache license header format in converted markdown files RST-to-MD conversion emitted MyST `%` comment syntax with blank line between each header line, which renders as visible text. Replace with canonical `<!--- ... -->` HTML comment block matching upstream apache/datafusion and this repo's existing markdown files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: fix broken cross-reference links in distributing-work The RST -> MyST conversion left two intra-page links as undefined reference-style links, which CommonMark renders as literal bracketed text (no Sphinx warning, so the --fail-on-warning build still passed). Point both at the auto-generated heading anchors instead. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: execute examples via myst-nb; native tables and validated refs Removes the last RST-syntax islands from the converted MyST markdown so the docs are markdown-native for both human and LLM authors. Executable examples (A): replace IPython.sphinxext.ipython_directive with myst-nb. The 83 `{eval-rst}` + `.. ipython:: python` blocks become native `{code-cell} ipython3` blocks, and the 14 pages that carry them gain jupytext/kernelspec front matter so myst-nb runs them. conf.py routes .md through myst-nb with nb_execution_mode="force" and nb_execution_raise_on_error=True, so a failing example now fails the build. myst-nb gives each page its own kernel instead of the IPython directive's single namespace shared across all documents in build order. That isolation surfaced expressions.md, which only ever worked by inheriting `col`/`lit` from an earlier-built page — it now imports them itself. It also changes the execution working directory to each page's own folder, so build.sh symlinks the example data next to every page that reads it by relative name and registers the python3 kernel; CI now calls build.sh so it matches local. Tables (B): the 3 `.. list-table::` directives become GFM markdown tables. Cross-references (C): the two intra-page links in distributing-work.md that the conversion left as undefined markdown references (and that built green while rendering literal brackets) become `{ref}` roles backed by explicit `(label)=` targets, so a future break fails the build instead of shipping silently. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: render DataFrame cell outputs as text, not the HTML widget myst-nb prefers a cell's `_repr_html_` over its text repr. A datafusion DataFrame's HTML repr is a Jupyter-oriented widget — inline styles plus an injected <script> — that renders at the wrong width in the docs theme. Set nb_mime_priority_overrides so the html builder prefers text/plain. The 35 cells that end in a bare DataFrame now show the same readable ASCII table the old IPython directive produced, with no per-cell `.show()` edits and no dependence on the package-generated HTML staying theme-compatible. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(aggregations): use .alias() on grouping(), drop obsolete workaround apache/datafusion#21411 is resolved — `.alias()` now works directly on a `grouping()` expression. Removed the note describing the limitation and the with_column_renamed workaround in the rollup and grouping_sets examples, aliasing the grouping columns inline instead. Verified on the current branch: the aliased aggregates execute and produce the named columns. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: use a dark-mode variant of the logo The header logo was the same SVG in both color modes; the light-colored wordmark was hard to read on the dark theme. Point the theme's image_dark at a new original_dark.svg whose wordmark uses light strokes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: restore right-hand on-this-page TOC, collapsible The theme refresh emptied secondary_sidebar_items, dropping the on-this-page table of contents that the previous site showed. Bring it back on the right, wrapped in a native <details> so readers can fold it away on the longer guide pages. Adds a custom page-toc-collapsible secondary-sidebar template and styles the <summary> toggle (no JS). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: let readers hide the right TOC sidebar for full-width content Follow-up to restoring the on-this-page TOC: "collapsible" should hide the entire right-hand frame, not just fold the list. Replace the <details> wrapper with a floating toggle button (toc-toggle.js) that hides the whole secondary sidebar via a body class; the flex article container then reclaims the width (its 60em cap is lifted while hidden). The preference is remembered across pages in localStorage, and the button is suppressed below the theme's breakpoint where the sidebar is already collapsed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(deps): pin typing-extensions to one version so uv.lock parses in CI Adding the myst-nb docs stack pulled a newer typing-extensions only on Python < 3.11, splitting it into two locked versions. Our own `typing-extensions; python_full_version < '3.13'` dependency then spanned that split, which uv recorded as a multi-version edge without a `version` field — a form older uv builds (the one in CI's pinned setup-uv) reject with "missing source field but has more than one matching package". Add a [tool.uv] constraint-dependencies pin of typing-extensions>=4.15.0 so it resolves to a single version across all supported Pythons, removing the fork and the under-specified edge. Relocked; uv lock --locked is clean and no multi-version package has a marker-only edge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(deps): drop pickleshare and explicit ipython from docs group Both were only needed by the old IPython.sphinxext.ipython_directive, which myst-nb replaced. pickleshare (IPython %store, abandoned 2018) has no remaining consumer. ipython is now pulled transitively by ipykernel and myst-nb, so the explicit floor is redundant. Relocked. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Apply reviewer's suggestion to fix CI error --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent c0ac93b commit 55fd2c9

78 files changed

Lines changed: 7289 additions & 5927 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/build.yml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -552,9 +552,10 @@ jobs:
552552
run: |
553553
set -x
554554
cd docs
555-
curl -O https://gist.githubusercontent.com/ritchie46/cac6b337ea52281aa23c049250a4ff03/raw/89a957ff3919d90e6ef2d34235e6bf22304f3366/pokemon.csv
556-
curl -O https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet
557-
uv run --no-project make html
555+
# build.sh downloads the example data, registers the Jupyter kernel
556+
# myst-nb needs, symlinks the data next to each executed page, and
557+
# runs sphinx. Using it here keeps CI identical to a local build.
558+
uv run --no-project bash ./build.sh
558559
559560
- name: Copy & push the generated HTML
560561
if: github.event_name == 'push' && (github.ref == 'refs/heads/main' || github.ref_type == 'tag')

AGENTS.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -84,9 +84,9 @@ Every Python function must include a docstring with usage examples.
8484
When adding or updating an aggregate or window function, ensure the corresponding
8585
site documentation is kept in sync:
8686

87-
- **Aggregations**: `docs/source/user-guide/common-operations/aggregations.rst`
87+
- **Aggregations**: `docs/source/user-guide/common-operations/aggregations.md`
8888
add new aggregate functions to the "Aggregate Functions" list and include usage
8989
examples if appropriate.
90-
- **Window functions**: `docs/source/user-guide/common-operations/windows.rst`
90+
- **Window functions**: `docs/source/user-guide/common-operations/windows.md`
9191
add new window functions to the "Available Functions" list and include usage
9292
examples if appropriate.

Cargo.lock

Lines changed: 6 additions & 5 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/build.sh

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,21 @@ rm -rf build 2> /dev/null
3636
rm -rf temp 2> /dev/null
3737
mkdir temp
3838
cp -rf source/* temp/
39+
40+
# myst-nb executes each page as a notebook from the directory that page
41+
# lives in, so the example data files must sit alongside every page that
42+
# loads them by relative name (e.g. `ctx.read_csv("pokemon.csv")`). Symlink
43+
# them into each directory that has such a page rather than copying the
44+
# 20 MB parquet repeatedly.
45+
for d in temp temp/user-guide temp/user-guide/common-operations; do
46+
ln -sf "$script_dir/pokemon.csv" "$d/pokemon.csv"
47+
ln -sf "$script_dir/yellow_tripdata_2021-01.parquet" "$d/yellow_tripdata_2021-01.parquet"
48+
done
49+
50+
# myst-nb runs `{code-cell}` blocks against a Jupyter kernel named "python3".
51+
# Register the active environment's interpreter as that kernel (idempotent).
52+
python -m ipykernel install --sys-prefix --name python3 --display-name "Python 3"
53+
3954
make SOURCEDIR=`pwd`/temp html
4055

4156
cd "$original_dir" || exit
Lines changed: 31 additions & 0 deletions
Loading

docs/source/_static/theme_overrides.css

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,3 +63,48 @@ html[data-theme="dark"] .table tbody tr:nth-of-type(odd) {
6363
white-space: normal !important;
6464
}
6565
}
66+
67+
68+
/* Hideable right-hand "On this page" sidebar.
69+
* toc-toggle.js adds the button and toggles `pst-secondary-hidden` on <body>;
70+
* hiding the sidebar lets the flex article container reclaim the width. */
71+
72+
body.pst-secondary-hidden .bd-sidebar-secondary {
73+
display: none;
74+
}
75+
76+
/* Let the article use the freed space rather than just re-centering. */
77+
body.pst-secondary-hidden .bd-article-container {
78+
max-width: none;
79+
}
80+
81+
/* Floating toggle button, pinned to the top-right under the navbar. */
82+
#pst-secondary-toggle {
83+
position: fixed;
84+
top: 4.5rem;
85+
right: 0.75rem;
86+
z-index: 1020;
87+
display: flex;
88+
align-items: center;
89+
justify-content: center;
90+
width: 2rem;
91+
height: 2rem;
92+
padding: 0;
93+
border: 1px solid var(--pst-color-border, #ccc);
94+
border-radius: 0.25rem;
95+
background-color: var(--pst-color-surface, #fff);
96+
color: var(--pst-color-text-base, #333);
97+
cursor: pointer;
98+
}
99+
100+
#pst-secondary-toggle:hover {
101+
color: rgb(var(--pst-color-link-hover));
102+
}
103+
104+
/* The toggle is only meaningful where the sidebar is shown (wide screens);
105+
* below the theme's lg breakpoint the sidebar is already collapsed away. */
106+
@media (max-width: 959.98px) {
107+
#pst-secondary-toggle {
108+
display: none;
109+
}
110+
}

docs/source/_static/toc-toggle.js

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one
3+
* or more contributor license agreements. See the NOTICE file
4+
* distributed with this work for additional information
5+
* regarding copyright ownership. The ASF licenses this file
6+
* to you under the Apache License, Version 2.0 (the
7+
* "License"); you may not use this file except in compliance
8+
* with the License. You may obtain a copy of the License at
9+
*
10+
* http://www.apache.org/licenses/LICENSE-2.0
11+
*
12+
* Unless required by applicable law or agreed to in writing,
13+
* software distributed under the License is distributed on an
14+
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
* KIND, either express or implied. See the License for the
16+
* specific language governing permissions and limitations
17+
* under the License.
18+
*/
19+
20+
/* Adds a button that hides the right-hand "On this page" sidebar so the
21+
* article can use the full page width. The choice is remembered across
22+
* pages via localStorage. */
23+
(function () {
24+
"use strict";
25+
var KEY = "pst-secondary-hidden";
26+
27+
function apply(hidden, btn) {
28+
document.body.classList.toggle("pst-secondary-hidden", hidden);
29+
if (btn) {
30+
btn.setAttribute("aria-pressed", String(hidden));
31+
btn.title = hidden ? "Show page contents" : "Hide page contents";
32+
}
33+
}
34+
35+
document.addEventListener("DOMContentLoaded", function () {
36+
// Only offer the toggle on pages that actually have the sidebar.
37+
if (!document.querySelector(".bd-sidebar-secondary")) {
38+
return;
39+
}
40+
41+
var btn = document.createElement("button");
42+
btn.id = "pst-secondary-toggle";
43+
btn.type = "button";
44+
btn.setAttribute("aria-label", "Toggle page contents sidebar");
45+
btn.innerHTML = '<i class="fa-solid fa-list" aria-hidden="true"></i>';
46+
document.body.appendChild(btn);
47+
48+
btn.addEventListener("click", function () {
49+
var hidden = !document.body.classList.contains("pst-secondary-hidden");
50+
try {
51+
localStorage.setItem(KEY, hidden ? "1" : "0");
52+
} catch (e) {
53+
/* localStorage may be unavailable; toggle still works for this page. */
54+
}
55+
apply(hidden, btn);
56+
});
57+
58+
var stored = null;
59+
try {
60+
stored = localStorage.getItem(KEY);
61+
} catch (e) {
62+
/* ignore */
63+
}
64+
apply(stored === "1", btn);
65+
});
66+
})();

docs/source/conf.py

Lines changed: 42 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -48,16 +48,40 @@
4848
extensions = [
4949
"sphinx.ext.mathjax",
5050
"sphinx.ext.napoleon",
51-
"myst_parser",
52-
"IPython.sphinxext.ipython_directive",
51+
# myst_nb is a superset of myst_parser: it provides the MyST markdown
52+
# parser plus executable `{code-cell}` notebook directives. Do NOT also
53+
# list "myst_parser" — myst_nb activates it internally and listing both
54+
# raises an extension conflict.
55+
"myst_nb",
5356
"autoapi.extension",
5457
]
5558

59+
# NOTE: .rst stays alongside .md because sphinx-autoapi generates RST
60+
# under autoapi/ and Sphinx needs the suffix to parse it. The human-
61+
# authored docs are all MyST .md now. ".md" is routed through myst-nb so
62+
# pages carrying jupytext/kernelspec front matter execute their
63+
# `{code-cell}` blocks; pages without that front matter render as plain
64+
# MyST markdown. The ".rst" entry is only for the autoapi build artifacts.
5665
source_suffix = {
5766
".rst": "restructuredtext",
58-
".md": "markdown",
67+
".md": "myst-nb",
5968
}
6069

70+
# Execute notebook code cells at build time and fail the build if any cell
71+
# raises — this replaces the old IPython sphinx directive, whose executed
72+
# examples are now `{code-cell}` blocks. "force" re-executes every build so
73+
# stale cached output can never ship.
74+
nb_execution_mode = "force"
75+
nb_execution_timeout = 120
76+
nb_execution_raise_on_error = True
77+
78+
# Prefer the plain-text repr of a cell's last expression over its rich
79+
# `_repr_html_`. A DataFrame's HTML repr is a self-contained widget (inline
80+
# styles + an injected <script>) built for Jupyter; in the docs theme it
81+
# renders at the wrong width. The text repr is the readable table the old
82+
# IPython directive showed and is stable across datafusion versions.
83+
nb_mime_priority_overrides = [("html", "text/plain", 0)]
84+
6185
# Add any paths that contain templates here, relative to this directory.
6286
templates_path = ["_templates"]
6387

@@ -120,7 +144,7 @@ def setup(sphinx) -> None:
120144
"show_toc_level": 2,
121145
"logo": {
122146
"image_light": "_static/images/original.svg",
123-
"image_dark": "_static/images/original.svg",
147+
"image_dark": "_static/images/original_dark.svg",
124148
"alt_text": "Apache DataFusion in Python",
125149
},
126150
"navbar_start": ["navbar-logo"],
@@ -138,7 +162,10 @@ def setup(sphinx) -> None:
138162
"icon": "fa-brands fa-rust",
139163
},
140164
],
141-
"secondary_sidebar_items": [],
165+
# Right-hand "On this page" TOC. A toggle button (added by
166+
# _static/toc-toggle.js) lets the reader hide the whole sidebar and give
167+
# the article full width.
168+
"secondary_sidebar_items": ["page-toc"],
142169
"collapse_navigation": True,
143170
"show_nav_level": 2,
144171
}
@@ -164,12 +191,20 @@ def setup(sphinx) -> None:
164191

165192
html_css_files = ["theme_overrides.css"]
166193

194+
# Adds a button that hides the right-hand "On this page" sidebar so the
195+
# article can use the full width (see _static/toc-toggle.js).
196+
html_js_files = ["toc-toggle.js"]
197+
167198
html_sidebars = {
168199
"**": ["sidebar-globaltoc.html"],
169200
}
170201

171202
# tell myst_parser to auto-generate anchor links for headers h1, h2, h3
172203
myst_heading_anchors = 3
173204

174-
# enable nice rendering of checkboxes for the task lists
175-
myst_enable_extensions = ["tasklist"]
205+
# MyST extensions:
206+
# - tasklist: GitHub-style `- [x]` checkboxes
207+
# - colon_fence: `:::{directive}` blocks (needed by execution-metrics.md
208+
# after the RST -> MyST conversion)
209+
# - deflist: definition lists (used in a couple of converted pages)
210+
myst_enable_extensions = ["tasklist", "colon_fence", "deflist"]

0 commit comments

Comments
 (0)