Demo: train an MNIST MLP classifier in SQL by alxmrs · Pull Request #196 · xqlsystems/xarray-sql

alxmrs · 2026-06-28T15:08:35Z

Stacked on the ERA5/gradient-descent demo branch (#195). Adds benchmarks/mnist_mlp.py.

What this is

A one-hidden-layer MLP (196 → 32 tanh → 10 softmax, on 2×2-pooled 14×14 MNIST) trained by gradient descent where every gradient is computed in SQL over data registered as xarray. The optimisation loop is plain Python; all the math is relational.

Reverse-mode autodiff expressed as relational algebra:

matmul = join + GROUP BY SUM — a layer's pre-activation is SUM(input · weight) grouped by (sample, unit).
local derivatives = grad() — the hidden activation's Jacobian is grad(tanh(z), z), the autograd feature doing the calculus per (sample, unit).
cotangent propagation = join; parameter gradients = join + GROUP BY AVG.

The only hand-written gradient is softmax + cross-entropy's delta = softmax - onehot (softmax couples classes through a per-sample normaliser, an aggregate grad does not cross — staying faithful to SQL).

Reaches ~83% test accuracy in ~45s; downloads MNIST on first run. PEP 723 inline deps, uv run benchmarks/mnist_mlp.py.

🤖 Generated with Claude Code

Generated by Claude Code

Stacked demo branch (on the autograd feature) holding the runnable benchmark scripts, kept out of the core branch so it stays reviewable. * grad_era5.py: symbolic grad over real ARCO-ERA5 data (wind-speed sensitivity checked exactly; saturation vapour pressure checked against the closed-form Clausius-Clapeyron slope). The queries ORDER BY latitude DESC, longitude to match ERA5's native order, so results line up with the xarray reference with no sorting on either side (single partition, so the order survives to_dataset). * grad_descent.py: gradient descent as ONE declarative recursive-CTE query. differentiate_sql compiles the per-row update rule to SQL once; a recursive CTE then iterates it. No Python loop. Fit matches numpy least-squares. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017mDoFJgsm9kS7SicGoCVF6

A one-hidden-layer MLP (196->32 tanh->10 softmax, on 2x2-pooled 14x14 MNIST) trained by gradient descent with every gradient computed in SQL. The images are registered as xarray (the library's core); the model weights and per-step intermediates are DataFusion in-memory tables (register_record_batches), so a matmul is a join over them and there's no xarray pivot per step. Reverse-mode autodiff as relational algebra: matmul = join + GROUP BY SUM; the hidden activation's local Jacobian = grad(tanh(z), z); cotangent propagation = join; parameter gradients = join + GROUP BY AVG. The only hand-written gradient is softmax + cross-entropy's delta = softmax - onehot. ~83% test accuracy in ~20s. Adds a benchmarks README entry. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017mDoFJgsm9kS7SicGoCVF6

Rewrite mnist_mlp.py so the whole model and its entire training history live in a single append-only table model(step, layer, i, j, val): every parameter is a row tagged by generation, and a training step appends the next generation's rows rather than mutating anything. Each step is a single SQL statement (forward, grad(tanh(z),z) backprop, parameter update); evaluation is SQL too (a forward pass with ROW_NUMBER() for the argmax). Python no longer holds the weights or computes any gradients — it only sequences the steps. A 2-layer net can't be one recursive CTE (the recursive relation may be referenced only once, but W1/W2 are used several times per step) and unrolling the steps as non-recursive CTEs blows up exponentially (DataFusion inlines CTEs; no MATERIALIZED). Materialising between steps is therefore host-driven; the thin loop does exactly that. Reaches ~83% test accuracy over 60 steps. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Make the architecture itself data. The whole model is one xr.Dataset: each layer's weight is a data_var w{L} over its boundary dims (u{L}, u{L+1}), sharing the dims that connect adjacent layers (the join keys). The dim sizes are the layer widths and the number of weights is the depth, so differing neuron counts are just differing dim sizes — no padding, because the relational long form is naturally ragged. from_dataset splits the one Dataset into a table per weight; changing WIDTHS trains a different network with the same code. One generic contract()-based loop trains a net of any depth: forward contracts each layer, backward is the same contraction transposed (VJP of a contraction is a contraction) with grad(tanh(z), z) for the local derivative. Validated exact against numpy at depth 3. Training metrics are a relation too: each logged step appends a (step, loss, train_acc, test_acc) row to a metrics table rather than a Python list. The trained model, predictions, and metrics all come back out as xarray via to_dataset. ~83% test accuracy in ~13s. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Two simplifications collapse the model to a single relation: - Bias folded into the weights (an nn.Linear): each layer's bias is the weight of a constant-1 input, kept as the row inp=width of the same weight array, so a layer is one matrix. - A layer dimension: every layer's weight lives in one weight(layer, inp, out) array, so forward/backward filter on the layer COLUMN instead of referencing a table per layer. The model is one xr.Dataset with a layer dim (NaN-padded for the ragged pyramid, dropped on seed); from_dataset registers it; the update is one query over the whole weight relation. A single contract() and a generic loop train a net of any depth (validated exact against numpy at depth 3). Tensors.put now unifies batch nullability so UNION results register cleanly. Faster too (~6s vs ~13s) at the same ~83% test accuracy; model and metrics still round-trip to xarray. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

alxmrs force-pushed the claude/xarray-sql-era5-demo branch from afb1036 to fdb17fb Compare June 28, 2026 15:17

alxmrs force-pushed the claude/xarray-sql-mnist-demo branch from 0810348 to 29b49dc Compare June 28, 2026 15:21

alxmrs force-pushed the claude/xarray-sql-era5-demo branch from fdb17fb to 8f97173 Compare June 28, 2026 15:47

alxmrs force-pushed the claude/xarray-sql-mnist-demo branch from 29b49dc to f2126da Compare June 28, 2026 15:47

alxmrs force-pushed the claude/xarray-sql-era5-demo branch from 8f97173 to 27c02d4 Compare June 28, 2026 15:58

alxmrs force-pushed the claude/xarray-sql-mnist-demo branch from f2126da to d9728c3 Compare June 28, 2026 15:58

Base automatically changed from claude/xarray-sql-era5-demo to claude/xarray-sql-autograd-73ovqq June 30, 2026 13:31

alxmrs force-pushed the claude/xarray-sql-autograd-73ovqq branch from a4fc101 to 7b1e530 Compare June 30, 2026 13:34

claude added 2 commits June 30, 2026 16:40

alxmrs force-pushed the claude/xarray-sql-mnist-demo branch from d9728c3 to b8d3e83 Compare June 30, 2026 13:40

alxmrs and others added 3 commits June 30, 2026 17:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Demo: train an MNIST MLP classifier in SQL#196

Demo: train an MNIST MLP classifier in SQL#196
alxmrs wants to merge 5 commits into
claude/xarray-sql-autograd-73ovqqfrom
claude/xarray-sql-mnist-demo

alxmrs commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

alxmrs commented Jun 28, 2026

What this is

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants