design: analyzer is a pure graph provider — taint/slicing move to the frontend SDK#7
Open
rahlk wants to merge 1 commit into
Open
design: analyzer is a pure graph provider — taint/slicing move to the frontend SDK#7rahlk wants to merge 1 commit into
rahlk wants to merge 1 commit into
Conversation
… frontend SDK Establishes the provider/client boundary across the dataflow skillset: - codeanalyzer-backend emits the graph substrate only — program_graphs (CFG/PDG/SDG) with transitive HRB SUMMARY edges — and never a taint_flows section, never a sources/sinks/sanitizers policy, never a slice. SUMMARY edges are the exception that proves the rule: keyed on data dependence (not any taint policy), they are reusable substrate and stay analyzer-side, and are what make frontend queries context-sensitive. - cldk-sdk-frontend owns slicing and taint as reachability queries over the emitted graph: the SDK holds the model packs, produces taint_flows/slice results, and carries the Slice/Taint gates (sdk-testing.md § 3b). Rationale: a taint result is keyed on a policy that changes far faster than the graph; keeping the query in the SDK means a policy edit re-runs a cheap traversal instead of re-emitting the universal graph. Mirrors Joern's factoring (CPG stores the substrate; reachableBy is a query, not materialized all-pairs taint edges). First instantiated in codeanalyzer-java (SDG at -a 3; SUMMARY-edge substrate as the next rung); the sibling level-3 epics are amended to match. Touches: dataflow-graphs.md (contract + boundary + gates split), dataflow-construction.md (Stage 8), dataflow-issue-template.md (goals, PART 3, PR ladder, DoD, title), backend SKILL.md (level-3 section), frontend SKILL.md (new Client analyses section) and sdk-testing.md (Slice/Taint gates).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Establishes a design principle across the dataflow skillset, first instantiated in
codeanalyzer-java(SDG at-a 3;SUMMARY-edge substrate as the next rung) and now amended onto the sibling level-3 epics (python#67, typescript#2, clang#2, go#3, rust#25).The principle
The analyzer is a pure graph provider. Level 3 emits the dependence-graph substrate —
program_graphs(CFG/PDG/SDG) with transitive HRBSUMMARYedges — and stops. Client analyses (taint, slicing, reachability) are frontend SDK queries over that graph, not analyzer features. The analyzer emits notaint_flowssection, ingests no sources/sinks/sanitizers policy, and runs no slice.SUMMARYedges are the exception that proves the rule: keyed on data dependence (not any taint policy), they're reusable across every config, so they stay analyzer-side — and they're exactly what make the frontend's queries context-sensitive.Why: a taint result is keyed on a policy that changes far faster than the graph. Keeping the query (and its
taint_flowsoutput + model packs) in the SDK means a policy edit re-runs a cheap traversal instead of re-emitting the universal graph. This is Joern's factoring — the CPG stores the dependence substrate;reachableByis a query, not materialized all-pairs taint edges.Changes
codeanalyzer-backend(builds the graph only):dataflow-graphs.md: new provider/client boundary; droptaint_flowsfrom the emitted JSON; reframe "Client analyses" as frontend-owned; split the verification gates (SDG/SUMMARY stay; Slice/Taint become frontend gates).dataflow-construction.md: Stage 8 is now CPG-only; slicing/taint explicitly not an analyzer stage.dataflow-issue-template.md: retitled (drop "and taint analysis"), goals/PART 3/PR ladder/DoD rescoped to substrate +SUMMARYedges; taint/slicing called out as a separate SDK ladder.SKILL.md: level-3 section states the boundary.cldk-sdk-frontend(owns the queries):SKILL.md: new "Client analyses are the SDK's job" section — slicing/taint overprogram_graphs, model packs as data,TaintFlow/slice-result models, facade methods, over-approximation surfacing.sdk-testing.md: § 3b Slice/Taint (+ context-sensitivity) gates.Reference instantiation: codellm-devkit/codeanalyzer-java#171 (decision #11) and #173 (the
SUMMARY-edge substrate rung).