Batch lazy IRI resolution (prefetch / identity map / DataLoader) to avoid N+1 resolves

## Summary

Lazy IRI resolution currently happens synchronously, per field, per object. Touching a relation
(`entity.some_relation`) triggers its own `_resolve(...)` call. When code walks many relations across many
entities, this becomes the classic N+1 problem: O(nodes) backend round trips, executed sequentially.

This issue proposes a feature to automatically bundle lazy resolutions into a small number of batched
backend calls.

## Background

In `oold/model/v1/__init__.py`, `__getattribute__` resolves a relation field on access when the stored
value is still unresolved:

```python
if name in self.__iris__ and len(self.__iris__[name]) > 0:
    if self.__dict__[name] is None or (isinstance(..., list) and len(...) == 0):
        node_dict = self._resolve(iris)   # one call, for one field, of one object
        ...
```

The primitives needed for batching already exist:

- `_resolve(iris: list)` is already batch capable (takes a list, returns `{iri: node}`).
- `get_iri_ref(field)` / `get_raw(field)` read links without resolving.

So the gap is purely about driving `_resolve` with all pending IRIs at once instead of one field at a time.

## The core constraint

`__getattribute__` is synchronous: when you touch `entity.relation`, the value must exist before the
expression returns. You cannot transparently defer a sync attribute access without a proxy object. So the
design is about where to place the batch boundary. Three viable strategies follow.

## Option A: explicit batched prefetch (recommended primary)

Equivalent to ORM `selectinload` / Django `prefetch_related` / GraphQL look ahead. The caller declares
relation paths; oold does a level order (BFS) traversal and batches one `_resolve` per depth level, so
backend calls become O(depth) instead of O(nodes).

```python
def prefetch(roots, paths):
    # paths: ["input", "output", "tool", "output.sample"]
    tree = _paths_to_tree(paths)
    frontier = [(e, tree) for e in roots]
    while frontier:
        pending = [(e, f, iri) for e, sub in frontier
                              for f in sub
                              for iri in _iri_refs(e, f)]
        nodes = backend.resolve(sorted({iri for *_, iri in pending}))   # one call per level
        frontier = [(nodes[iri], sub[f]) for (e, f, iri) in pending
                    if nodes.get(iri) and sub[f]]
```

Suggested public API:

```python
OSW.load_entity(LoadEntityParam(titles=..., prefetch=["output", "tool", "input"]))
entity.resolve(["output.sample"])
resolve_all(list_of_entities, ["output", "tool", "input"])
```

Deterministic, no proxy magic, fits sync pydantic, and it is the natural place to make resolution tolerant
of partial failures.

## Option B: DataLoader plus identity map session (auto batching)

The DataLoader pattern: a per session loader coalesces `load(iri)` calls within a window, dispatches one
batched fetch, and caches by IRI (unit of work / identity map, so the same IRI is fetched once and resolves
to one object).

```python
class IriLoader:
    def __init__(self, backend):
        self._backend, self._cache, self._queue = backend, {}, []
    def load(self, iri): ...      # returns a Future
    def flush(self):
        todo = [i for i in self._queue if i not in self._cache]
        self._cache.update(self._backend.resolve(unique(todo)))   # one call
        # resolve queued futures from cache (None or Error allowed per key)
```

Transparent batching needs a deferral point:

- Async resolvers (`await entity.relation`): the event loop tick is the batch window. This is where
  DataLoader shines, and is probably the cleanest long term direction if oold goes async.
- A sync collect context:

```python
with oold.batch_resolution():
    for proc in procs:
        proc.input
        proc.output
# on exit: one batched resolve, then values are populated
```

The identity map is valuable on its own (dedupe plus stable object identity), independent of batching.

## Option C: lazy reference proxies (transparent, sharp edges)

`__getattribute__` returns a `LazyRef(iri, batch)` instead of resolving. The first real use of any proxy
flushes the shared batch, then forwards. This makes plain loops auto batch, but proxies leak into
`isinstance`, equality, pydantic validation and serialization. Probably not worth the footguns unless full
transparency is a hard requirement.

## Recommendation

1. Identity map resolution session (`with oold.session():`): caches `iri -> node`, dedupes, stabilizes
   identity. Foundation for the rest.
2. `prefetch` / `selectinload` style API (Option A) as the ergonomic front door. Covers most cases,
   deterministic, sync friendly.
3. Partial failure semantics in the batch resolver: per key `node | Error` with a policy flag
   (`skip` / `raise` / `collect_errors`). On real wikis full of half valid datasets this turns "one bad
   linked entity kills the traversal" into "you get the good ones plus a list of errors". This is the part
   that matters most in practice.
4. DataLoader (Option B) later, naturally, if and when resolvers become async.

## Design notes

- Keep the dict shaped resolver (`{iri: node}`): it is order independent and dedupes. Let the prefetch
  layer own re association back to `(entity, field, position)` rather than returning aligned lists that
  break on dedupe.
- A good litmus test: a hand rolled `get_iri_ref` plus batched load workaround should collapse into
  `resolve_all(entities, ["output", "tool", "input"], errors="skip")`.

## Motivation / real world case

Building a dashboard over measurement data, we walk `ProcessDocumentation -> input (Sample)`,
`-> output (Dataset)`, `-> tool (MeasurementUnit)` across many process docs. Lazy per field resolution
caused:

1. Sequential round trips (one linked page at a time), very slow for a handful of processes.
2. A single invalid linked dataset (for example a `Dataset` with an out of enum unit) raising mid
   traversal and aborting the whole walk.

The workaround was to bypass resolution entirely via `get_iri_ref` and batch load the linked entities in one
parallel call, tolerating per entity validation failures. A first class prefetch plus identity map plus
partial failure feature would make that the default rather than a manual pattern.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batch lazy IRI resolution (prefetch / identity map / DataLoader) to avoid N+1 resolves #92

Summary

Background

The core constraint

Option A: explicit batched prefetch (recommended primary)

Option B: DataLoader plus identity map session (auto batching)

Option C: lazy reference proxies (transparent, sharp edges)

Recommendation

Design notes

Motivation / real world case

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Batch lazy IRI resolution (prefetch / identity map / DataLoader) to avoid N+1 resolves #92

Description

Summary

Background

The core constraint

Option A: explicit batched prefetch (recommended primary)

Option B: DataLoader plus identity map session (auto batching)

Option C: lazy reference proxies (transparent, sharp edges)

Recommendation

Design notes

Motivation / real world case

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions