docs: convert reStructuredText sources to MyST markdown#1579
docs: convert reStructuredText sources to MyST markdown#1579timsaucer wants to merge 12 commits into
Conversation
a400ec1 to
67c2761
Compare
026b9e5 to
30efd76
Compare
Phase 2 of the documentation-site refresh. Run `rst2myst convert` over
every human-authored .rst file under docs/source/ and remove the
originals. The result:
- 33 .rst files become 33 .md files (user guide, contributor guide,
index, links).
- Headings, paragraphs, hyperlinks, code blocks, admonitions, and
toctree directives all map cleanly to MyST syntax.
- Cross-reference anchors round-trip through MyST as `(label)=`
blocks. The converter kebab-cased the labels (e.g. `(io-csv)=`),
but every `{ref}` target in the corpus still uses the underscore
form from the original RST (`{ref}\`CSV <io_csv>\``) and so do the
Python docstrings that AutoAPI pulls in. Rewrite the anchors back
to the underscore form so the existing references resolve.
- 86 `{eval-rst}` blocks remain — they all wrap `.. ipython::`
directives, which have no first-class MyST equivalent. They render
identically and don't block the build.
conf.py changes:
- Enable `colon_fence` and `deflist` MyST extensions (rst-to-myst
emits these on a few files, particularly execution-metrics.md).
- Keep `.rst` in `source_suffix` even though no human-authored RST
remains: sphinx-autoapi generates RST under autoapi/ at build time
and Sphinx needs the suffix registered to parse it.
AGENTS.md: update the two .rst paths called out under "Aggregate and
Window Function Documentation" to point at the .md equivalents.
Verified by building locally — `build succeeded`, no warnings, all
internal cross-references resolve, the ipython examples on the
landing page and basics page still execute.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RST-to-MD conversion emitted MyST `%` comment syntax with blank line between each header line, which renders as visible text. Replace with canonical `<!--- ... -->` HTML comment block matching upstream apache/datafusion and this repo's existing markdown files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The RST -> MyST conversion left two intra-page links as undefined reference-style links, which CommonMark renders as literal bracketed text (no Sphinx warning, so the --fail-on-warning build still passed). Point both at the auto-generated heading anchors instead. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Removes the last RST-syntax islands from the converted MyST markdown so
the docs are markdown-native for both human and LLM authors.
Executable examples (A): replace IPython.sphinxext.ipython_directive with
myst-nb. The 83 `{eval-rst}` + `.. ipython:: python` blocks become native
`{code-cell} ipython3` blocks, and the 14 pages that carry them gain
jupytext/kernelspec front matter so myst-nb runs them. conf.py routes .md
through myst-nb with nb_execution_mode="force" and
nb_execution_raise_on_error=True, so a failing example now fails the build.
myst-nb gives each page its own kernel instead of the IPython directive's
single namespace shared across all documents in build order. That isolation
surfaced expressions.md, which only ever worked by inheriting `col`/`lit`
from an earlier-built page — it now imports them itself. It also changes the
execution working directory to each page's own folder, so build.sh symlinks
the example data next to every page that reads it by relative name and
registers the python3 kernel; CI now calls build.sh so it matches local.
Tables (B): the 3 `.. list-table::` directives become GFM markdown tables.
Cross-references (C): the two intra-page links in distributing-work.md that
the conversion left as undefined markdown references (and that built green
while rendering literal brackets) become `{ref}` roles backed by explicit
`(label)=` targets, so a future break fails the build instead of shipping
silently.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
myst-nb prefers a cell's `_repr_html_` over its text repr. A datafusion DataFrame's HTML repr is a Jupyter-oriented widget — inline styles plus an injected <script> — that renders at the wrong width in the docs theme. Set nb_mime_priority_overrides so the html builder prefers text/plain. The 35 cells that end in a bare DataFrame now show the same readable ASCII table the old IPython directive produced, with no per-cell `.show()` edits and no dependence on the package-generated HTML staying theme-compatible. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
apache/datafusion#21411 is resolved — `.alias()` now works directly on a `grouping()` expression. Removed the note describing the limitation and the with_column_renamed workaround in the rollup and grouping_sets examples, aliasing the grouping columns inline instead. Verified on the current branch: the aliased aggregates execute and produce the named columns. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The header logo was the same SVG in both color modes; the light-colored wordmark was hard to read on the dark theme. Point the theme's image_dark at a new original_dark.svg whose wordmark uses light strokes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The theme refresh emptied secondary_sidebar_items, dropping the on-this-page table of contents that the previous site showed. Bring it back on the right, wrapped in a native <details> so readers can fold it away on the longer guide pages. Adds a custom page-toc-collapsible secondary-sidebar template and styles the <summary> toggle (no JS). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-up to restoring the on-this-page TOC: "collapsible" should hide the entire right-hand frame, not just fold the list. Replace the <details> wrapper with a floating toggle button (toc-toggle.js) that hides the whole secondary sidebar via a body class; the flex article container then reclaims the width (its 60em cap is lifted while hidden). The preference is remembered across pages in localStorage, and the button is suppressed below the theme's breakpoint where the sidebar is already collapsed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adding the myst-nb docs stack pulled a newer typing-extensions only on Python < 3.11, splitting it into two locked versions. Our own `typing-extensions; python_full_version < '3.13'` dependency then spanned that split, which uv recorded as a multi-version edge without a `version` field — a form older uv builds (the one in CI's pinned setup-uv) reject with "missing source field but has more than one matching package". Add a [tool.uv] constraint-dependencies pin of typing-extensions>=4.15.0 so it resolves to a single version across all supported Pythons, removing the fork and the under-specified edge. Relocked; uv lock --locked is clean and no multi-version package has a marker-only edge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
c0671bf to
922c6e8
Compare
timsaucer
left a comment
There was a problem hiding this comment.
I've tried to review and mark any changes that are not just formatting or reference styles by switching to markdown.
| --- | ||
| jupytext: | ||
| text_representation: | ||
| extension: .md | ||
| format_name: myst | ||
| kernelspec: | ||
| name: python3 | ||
| display_name: Python 3 | ||
| --- |
There was a problem hiding this comment.
These lines indicate to the build system that we are running the jupyter extension to build the examples.
| nb_execution_mode = "force" | ||
| nb_execution_timeout = 120 | ||
| nb_execution_raise_on_error = True |
There was a problem hiding this comment.
These lines will force failing examples to fail the documentation build.
| # styles + an injected <script>) built for Jupyter; in the docs theme it | ||
| # renders at the wrong width. The text repr is the readable table the old | ||
| # IPython directive showed and is stable across datafusion versions. | ||
| nb_mime_priority_overrides = [("html", "text/plain", 0)] |
There was a problem hiding this comment.
This was necessary to enforce text output of the dataframes instead of rendered html. It gives a more consistent experience IMO, especially as the html rendering code has had some changes in the past few releases.
| Apply `.alias()` to the `grouping()` expression to give the column a readable name: | ||
|
|
||
| ```{code-cell} ipython3 | ||
| result = df.aggregate( | ||
| [GroupingSet.rollup(col_type_1)], | ||
| [f.count(col_speed).alias("Count"), | ||
| f.avg(col_speed).alias("Avg Speed"), | ||
| f.grouping(col_type_1).alias("Is Total")] | ||
| ) | ||
| result.sort(col_type_1.sort(ascending=True, nulls_first=True)) | ||
| ``` |
There was a problem hiding this comment.
This is a substantive difference from the prior work. The issue in apache/datafusion#21411 has been resolved and verified in 54.0.0 so I removed the old warning about grouping sets with aliases. You can see the old text in this section: https://datafusion.apache.org/python/user-guide/common-operations/aggregations.html#rollup
| /* Hideable right-hand "On this page" sidebar. | ||
| * toc-toggle.js adds the button and toggles `pst-secondary-hidden` on <body>; | ||
| * hiding the sidebar lets the flex article container reclaim the width. */ | ||
|
|
||
| body.pst-secondary-hidden .bd-sidebar-secondary { | ||
| display: none; | ||
| } |
There was a problem hiding this comment.
The code added to this section allows you to hide away the right hand table of contents so you get a bigger view of the site content. There is also the corresponding work in toc-toggle.js
| # build.sh downloads the example data, registers the Jupyter kernel | ||
| # myst-nb needs, symlinks the data next to each executed page, and | ||
| # runs sphinx. Using it here keeps CI identical to a local build. | ||
| uv run --no-project bash ./build.sh |
There was a problem hiding this comment.
Use a single build path, both for local development and CI.
| for d in temp temp/user-guide temp/user-guide/common-operations; do | ||
| ln -sf "$script_dir/pokemon.csv" "$d/pokemon.csv" | ||
| ln -sf "$script_dir/yellow_tripdata_2021-01.parquet" "$d/yellow_tripdata_2021-01.parquet" | ||
| done |
There was a problem hiding this comment.
Description above explains why these changes were added to build steps.
# Conflicts: # docs/source/user-guide/common-operations/functions.rst
Both were only needed by the old IPython.sphinxext.ipython_directive, which myst-nb replaced. pickleshare (IPython %store, abandoned 2018) has no remaining consumer. ipython is now pulled transitively by ipykernel and myst-nb, so the explicit floor is redundant. Relocked. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>


Which issue does this PR close?
There is no open issue but this continues the work done in #1578.
Rationale for this change
Phase 2 of the documentation-site refresh started in #1578. With the modern pydata-sphinx-theme + navigation in place, this PR moves the content format off
.rstand onto MyST.md. The motivation:datafusion-cometsibling project completed the same migration recently and reported smoother contributor onboarding.What changes are included in this PR?
rst-to-myst).AGENTS.mdis updated so the two.rstpaths called out under "Aggregate and Window Function Documentation" point at the new.mdequivalents.myst-parsertomyst-nbso that we can do markdown parsing PLUS code execution to render our examples.Are there any user-facing changes?
No behavioral change to the
datafusionpackage — only the source format of the published documentation. Readers of the rendered site will not notice the migration; the HTML output is slightly updated but still shows all of the relevant content including running code.Follow-ups (out of scope for this PR)
asf-sitepublishing workflow.