Skip to content

16 — Level 3, stage 8: emission (-a 3) & CPG; slicing/taint move to the SDK #25

Description

@rahlk

⚠️ Scope amendment (2026-07-02): stage 8 is emission + CPG only — slicing & taint move to the SDK

Family-wide boundary now standard (cldk-forge PR #7, dataflow-graphs.md § provider/client boundary and dataflow-construction.md § Stage 8; reference codellm-devkit/codeanalyzer-java#171): the analyzer emits the graph substrate; client analyses are frontend SDK queries.

Superseded here → filed as codellm-devkit/python-sdk#229 (the shared, language-agnostic client-analysis engine; the Rust model pack lands with the consuming SDK): the Slicing task, the Taint task, the taint_flows output, and the client gates (interprocedural slice set / source→sink + sanitizer / witness paths) — those are now SDK-side gates.

Stays in this task: the Emission bullet — -a 3 + --graphs cfg,dfg,pdg,sdg (strict validation) + --graph-field-depth, the program_graphs section, and recording the shared vocabulary in SCHEMA_DECISIONS.md — plus the CPG projection if in scope. The SUMMARY edges those queries rely on are built in the stages 5–7 rung (#24) and remain analyzer-side. Keep a source→sink fixture so the SDK can assert taint over it. Retitled: stage 8 is now emission/CPG, not slicing/taint.


Part of #9. Phase 3 — native dataflow.

Learning goals

API design for analyses-as-queries; models-as-data (serde again, now with JSON-Schema validation
of user input); keeping a public contract stable while internals churn.

Task

  • Emission: -a 3 (implies -a 2) + --graphs cfg,dfg,pdg,sdg (strict validation — an
    unknown graph name exits non-zero) + --graph-field-depth (default 3). A program_graphs
    section in analysis.json, schema-versioned, keyed by canonical (signature, node_id) — same
    signatures as the symbol table. Record the shared node/edge vocabulary in SCHEMA_DECISIONS.md;
    the cross-language parity clause binds: Rust-specific additions are additive.
  • Slicing: context-sensitive backward slice via the two-phase HRB traversal over the SDG
    (phase 1 ascends PARAM_IN/CALL, phase 2 descends PARAM_OUT — SUMMARY edges carry you across
    calls without re-descending).
  • Taint: labeled reachability over the SDG; sanitizers block propagation; sources/sinks/
    sanitizers/library models supplied AS DATA (built-in pack < config file < flags) with JSON
    Schema validation; taint_flows output section with lazily-reconstructed witness paths and
    the model id that justified each hop (explainability).
  • -a 1 / -a 2 wall-clock must stay unaffected (CI-check a timing budget on the fixture).

Gate (client gates)

  • Interprocedural slice through the a → b → c chain matches the hand-computed set (SUMMARY edges
    proven used: the slice must NOT contain callee-internal nodes phase 1 would leak).
  • The fixture's source→sink taint flow is found; the SAME flow with a sanitizer interposed is
    NOT. Witness path names every hop.
  • -a 3 analysis.json validates; --graphs bogus exits non-zero.

Metadata

Metadata

Assignees

Labels

learning-ladderThe escalating-complexity curriculum issueslevel-3Native dataflow: CFG/PDG/SDG

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions