⚠️ Scope amendment (2026-07-02): stage 8 is emission + CPG only — slicing & taint move to the SDK
Family-wide boundary now standard (cldk-forge PR #7, dataflow-graphs.md § provider/client boundary and dataflow-construction.md § Stage 8; reference codellm-devkit/codeanalyzer-java#171): the analyzer emits the graph substrate; client analyses are frontend SDK queries.
Superseded here → filed as codellm-devkit/python-sdk#229 (the shared, language-agnostic client-analysis engine; the Rust model pack lands with the consuming SDK): the Slicing task, the Taint task, the taint_flows output, and the client gates (interprocedural slice set / source→sink + sanitizer / witness paths) — those are now SDK-side gates.
Stays in this task: the Emission bullet — -a 3 + --graphs cfg,dfg,pdg,sdg (strict validation) + --graph-field-depth, the program_graphs section, and recording the shared vocabulary in SCHEMA_DECISIONS.md — plus the CPG projection if in scope. The SUMMARY edges those queries rely on are built in the stages 5–7 rung (#24) and remain analyzer-side. Keep a source→sink fixture so the SDK can assert taint over it. Retitled: stage 8 is now emission/CPG, not slicing/taint.
Part of #9. Phase 3 — native dataflow.
Learning goals
API design for analyses-as-queries; models-as-data (serde again, now with JSON-Schema validation
of user input); keeping a public contract stable while internals churn.
Task
- Emission:
-a 3 (implies -a 2) + --graphs cfg,dfg,pdg,sdg (strict validation — an
unknown graph name exits non-zero) + --graph-field-depth (default 3). A program_graphs
section in analysis.json, schema-versioned, keyed by canonical (signature, node_id) — same
signatures as the symbol table. Record the shared node/edge vocabulary in SCHEMA_DECISIONS.md;
the cross-language parity clause binds: Rust-specific additions are additive.
- Slicing: context-sensitive backward slice via the two-phase HRB traversal over the SDG
(phase 1 ascends PARAM_IN/CALL, phase 2 descends PARAM_OUT — SUMMARY edges carry you across
calls without re-descending).
- Taint: labeled reachability over the SDG; sanitizers block propagation; sources/sinks/
sanitizers/library models supplied AS DATA (built-in pack < config file < flags) with JSON
Schema validation; taint_flows output section with lazily-reconstructed witness paths and
the model id that justified each hop (explainability).
-a 1 / -a 2 wall-clock must stay unaffected (CI-check a timing budget on the fixture).
Gate (client gates)
- Interprocedural slice through the a → b → c chain matches the hand-computed set (SUMMARY edges
proven used: the slice must NOT contain callee-internal nodes phase 1 would leak).
- The fixture's source→sink taint flow is found; the SAME flow with a sanitizer interposed is
NOT. Witness path names every hop.
-a 3 analysis.json validates; --graphs bogus exits non-zero.
Part of #9. Phase 3 — native dataflow.
Learning goals
API design for analyses-as-queries; models-as-data (serde again, now with JSON-Schema validation
of user input); keeping a public contract stable while internals churn.
Task
-a 3(implies-a 2) +--graphs cfg,dfg,pdg,sdg(strict validation — anunknown graph name exits non-zero) +
--graph-field-depth(default 3). Aprogram_graphssection in analysis.json, schema-versioned, keyed by canonical
(signature, node_id)— samesignatures as the symbol table. Record the shared node/edge vocabulary in SCHEMA_DECISIONS.md;
the cross-language parity clause binds: Rust-specific additions are additive.
(phase 1 ascends PARAM_IN/CALL, phase 2 descends PARAM_OUT — SUMMARY edges carry you across
calls without re-descending).
sanitizers/library models supplied AS DATA (built-in pack < config file < flags) with JSON
Schema validation;
taint_flowsoutput section with lazily-reconstructed witness paths andthe model id that justified each hop (explainability).
-a 1/-a 2wall-clock must stay unaffected (CI-check a timing budget on the fixture).Gate (client gates)
proven used: the slice must NOT contain callee-internal nodes phase 1 would leak).
NOT. Witness path names every hop.
-a 3analysis.json validates;--graphs bogusexits non-zero.