Skip to content

Demo: train an MNIST MLP classifier in SQL#196

Open
alxmrs wants to merge 5 commits into
claude/xarray-sql-autograd-73ovqqfrom
claude/xarray-sql-mnist-demo
Open

Demo: train an MNIST MLP classifier in SQL#196
alxmrs wants to merge 5 commits into
claude/xarray-sql-autograd-73ovqqfrom
claude/xarray-sql-mnist-demo

Conversation

@alxmrs

@alxmrs alxmrs commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator

Stacked on the ERA5/gradient-descent demo branch (#195). Adds benchmarks/mnist_mlp.py.

What this is

A one-hidden-layer MLP (196 → 32 tanh → 10 softmax, on 2×2-pooled 14×14 MNIST) trained by gradient descent where every gradient is computed in SQL over data registered as xarray. The optimisation loop is plain Python; all the math is relational.

Reverse-mode autodiff expressed as relational algebra:

  • matmul = join + GROUP BY SUM — a layer's pre-activation is SUM(input · weight) grouped by (sample, unit).
  • local derivatives = grad() — the hidden activation's Jacobian is grad(tanh(z), z), the autograd feature doing the calculus per (sample, unit).
  • cotangent propagation = join; parameter gradients = join + GROUP BY AVG.

The only hand-written gradient is softmax + cross-entropy's delta = softmax - onehot (softmax couples classes through a per-sample normaliser, an aggregate grad does not cross — staying faithful to SQL).

Reaches ~83% test accuracy in ~45s; downloads MNIST on first run. PEP 723 inline deps, uv run benchmarks/mnist_mlp.py.

🤖 Generated with Claude Code


Generated by Claude Code

@alxmrs alxmrs force-pushed the claude/xarray-sql-era5-demo branch from afb1036 to fdb17fb Compare June 28, 2026 15:17
@alxmrs alxmrs force-pushed the claude/xarray-sql-mnist-demo branch from 0810348 to 29b49dc Compare June 28, 2026 15:21
@alxmrs alxmrs force-pushed the claude/xarray-sql-era5-demo branch from fdb17fb to 8f97173 Compare June 28, 2026 15:47
@alxmrs alxmrs force-pushed the claude/xarray-sql-mnist-demo branch from 29b49dc to f2126da Compare June 28, 2026 15:47
@alxmrs alxmrs force-pushed the claude/xarray-sql-era5-demo branch from 8f97173 to 27c02d4 Compare June 28, 2026 15:58
@alxmrs alxmrs force-pushed the claude/xarray-sql-mnist-demo branch from f2126da to d9728c3 Compare June 28, 2026 15:58
Base automatically changed from claude/xarray-sql-era5-demo to claude/xarray-sql-autograd-73ovqq June 30, 2026 13:31
@alxmrs alxmrs force-pushed the claude/xarray-sql-autograd-73ovqq branch from a4fc101 to 7b1e530 Compare June 30, 2026 13:34
claude added 2 commits June 30, 2026 16:40
Stacked demo branch (on the autograd feature) holding the runnable benchmark
scripts, kept out of the core branch so it stays reviewable.

* grad_era5.py: symbolic grad over real ARCO-ERA5 data (wind-speed sensitivity
  checked exactly; saturation vapour pressure checked against the closed-form
  Clausius-Clapeyron slope). The queries ORDER BY latitude DESC, longitude to
  match ERA5's native order, so results line up with the xarray reference with
  no sorting on either side (single partition, so the order survives to_dataset).
* grad_descent.py: gradient descent as ONE declarative recursive-CTE query.
  differentiate_sql compiles the per-row update rule to SQL once; a recursive
  CTE then iterates it. No Python loop. Fit matches numpy least-squares.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017mDoFJgsm9kS7SicGoCVF6
A one-hidden-layer MLP (196->32 tanh->10 softmax, on 2x2-pooled 14x14 MNIST)
trained by gradient descent with every gradient computed in SQL. The images are
registered as xarray (the library's core); the model weights and per-step
intermediates are DataFusion in-memory tables (register_record_batches), so a
matmul is a join over them and there's no xarray pivot per step.

Reverse-mode autodiff as relational algebra: matmul = join + GROUP BY SUM; the
hidden activation's local Jacobian = grad(tanh(z), z); cotangent propagation =
join; parameter gradients = join + GROUP BY AVG. The only hand-written gradient
is softmax + cross-entropy's delta = softmax - onehot. ~83% test accuracy in
~20s. Adds a benchmarks README entry.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017mDoFJgsm9kS7SicGoCVF6
@alxmrs alxmrs force-pushed the claude/xarray-sql-mnist-demo branch from d9728c3 to b8d3e83 Compare June 30, 2026 13:40
alxmrs and others added 3 commits June 30, 2026 17:09
Rewrite mnist_mlp.py so the whole model and its entire training history live in a
single append-only table model(step, layer, i, j, val): every parameter is a row
tagged by generation, and a training step appends the next generation's rows
rather than mutating anything. Each step is a single SQL statement (forward,
grad(tanh(z),z) backprop, parameter update); evaluation is SQL too (a forward
pass with ROW_NUMBER() for the argmax). Python no longer holds the weights or
computes any gradients — it only sequences the steps.

A 2-layer net can't be one recursive CTE (the recursive relation may be
referenced only once, but W1/W2 are used several times per step) and unrolling
the steps as non-recursive CTEs blows up exponentially (DataFusion inlines CTEs;
no MATERIALIZED). Materialising between steps is therefore host-driven; the thin
loop does exactly that. Reaches ~83% test accuracy over 60 steps.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Make the architecture itself data. The whole model is one xr.Dataset: each
layer's weight is a data_var w{L} over its boundary dims (u{L}, u{L+1}), sharing
the dims that connect adjacent layers (the join keys). The dim sizes are the
layer widths and the number of weights is the depth, so differing neuron counts
are just differing dim sizes — no padding, because the relational long form is
naturally ragged. from_dataset splits the one Dataset into a table per weight;
changing WIDTHS trains a different network with the same code.

One generic contract()-based loop trains a net of any depth: forward contracts
each layer, backward is the same contraction transposed (VJP of a contraction is
a contraction) with grad(tanh(z), z) for the local derivative. Validated exact
against numpy at depth 3.

Training metrics are a relation too: each logged step appends a
(step, loss, train_acc, test_acc) row to a metrics table rather than a Python
list. The trained model, predictions, and metrics all come back out as xarray
via to_dataset. ~83% test accuracy in ~13s.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two simplifications collapse the model to a single relation:

- Bias folded into the weights (an nn.Linear): each layer's bias is the weight of
  a constant-1 input, kept as the row inp=width of the same weight array, so a
  layer is one matrix.
- A layer dimension: every layer's weight lives in one weight(layer, inp, out)
  array, so forward/backward filter on the layer COLUMN instead of referencing a
  table per layer. The model is one xr.Dataset with a layer dim (NaN-padded for
  the ragged pyramid, dropped on seed); from_dataset registers it; the update is
  one query over the whole weight relation.

A single contract() and a generic loop train a net of any depth (validated exact
against numpy at depth 3). Tensors.put now unifies batch nullability so UNION
results register cleanly. Faster too (~6s vs ~13s) at the same ~83% test
accuracy; model and metrics still round-trip to xarray.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants