Shared: Provenance-based filtering of flow summaries #21051

hvitved · 2025-12-16T13:40:36Z

This PR aligns the logic across languages for how flow summaries are prioritized based on provenance and exactness (that is, whether a model is defined directly for a function or for a function that is implemented/overridden).

A flow summary is considered relevant if:

It is manual exact model, or
It is a manual inexact model and there is no exact manual (neutral) model, or
It is a generated model and (a) there is no source code available for the modeled callable, (b) there is no manual (neutral) model, and (c) the model is inexact and there is no generated exact (neutral) model.

Note that for dynamic languages we currently pretend that no source code is available for functions with flow summaries, so 3.(a) holds vacuously.

Points 2 and 3.c represent a change for e.g. Java, where we would previously union exact and inexact models, which meant that it was not possible to overrule inexact models. As a consequence, some inexact manual have been replicated. DCA for Java reports some lost java/sensitive-log results on apache_solr, but looking at those results, they all have flow paths of length > 150, so they are almost certainly false positives, and most likely a consequence of 3.c.

In order for the logic to be defined in the shared flow summary library, I had to move provenance and exactness information into the propagatesFlow predicate, which is a breaking change.

Lastly, I have applied the ::Range pattern to the SummarizedCallable class for all languages except C++, which currently does not expose this class. This means that SummarizedCallable::Range will contain all flow summaries, whereas SummarizedCallable will only contain relevant summaries.

rust/ql/lib/codeql/rust/dataflow/FlowSummary.qll

shared/dataflow/codeql/dataflow/internal/FlowSummaryImpl.qll

java/ql/lib/semmle/code/java/dataflow/internal/DataFlowDispatch.qll

rust/ql/test/library-tests/dataflow/models/models.qlref

owen-mc

Go LGTM. It's great to get this logic shared - it was a bit worrying that different implementations were drifting apart. And it's one less thing to have to think about when supporting a new language.

michaelnebel · 2026-01-16T12:48:24Z

shared/dataflow/codeql/dataflow/internal/FlowSummaryImpl.qll

+      if p.isGenerated() or isExact = false
+      then
+        // Only apply generated models to functions in library code
+        not (p.isGenerated() and callableFromSource(c)) and
+        // Only apply generated or inexact models when no strictly better model exists
+        not exists(Provenance other, boolean isExactOther |
+          c.propagatesFlow(_, _, _, other, isExactOther, _)
+          or
+          neutralElement(c, "summary", other, isExactOther)
+        |
+          p.isGenerated() and other.isManual()
+          or
+          p.getVerification() = other.getVerification() and
+          isExact = false and
+          isExactOther = true
+        )
+      else any()


Maybe consider something like:

p.isManual() and isExact = true or // Only apply generated models to functions in library code not (p.isGenerated() and callableFromSource(c)) and // Only apply generated or inexact models when no strictly better model exists not exists(Provenance other, boolean isExactOther | c.propagatesFlow(_, _, _, other, isExactOther, _) or neutralElement(c, "summary", other, isExactOther) | p.isGenerated() and other.isManual() or p.getVerification() = other.getVerification() and isExact = false and isExactOther = true )

Or if the above suggestion doesn't perform, then invert the condition (then it is more aligned with the comment explaining the logic).

if p.isManual() and isExact = true then any() else ...

michaelnebel · 2026-01-16T13:03:31Z

@hvitved : This appears to break the model generator idempotency (at least for C#). I tried generating C# Runtime models from scratch (by first deleting the existing generated models) and then re-generate the model after this (which further changed the models).

This is a very nice change!

geoffw0

Rust, Swift changes and DCA LGTM.

hvitved · 2026-01-19T08:48:23Z

@hvitved : This appears to break the model generator idempotency (at least for C#). I tried generating C# Runtime models from scratch (by first deleting the existing generated models) and then re-generate the model after this (which further changed the models).

Oh no... Thanks for checking.

yoff

One question, otherwise both the code changes and DCA run looks good for Python.

yoff · 2026-01-20T13:08:12Z

python/ql/lib/semmle/python/dataflow/new/FlowSummary.qll

+      string model
+    ) {
+      this.propagatesFlow(input, output, preservesValue) and
+      p = "manual" and


Should only manual summaries propagate flow? (Or am I misunderstanding the code?)
I think we only have manual summaries so far, but if we ever get to generate some, I think we will not see their effect?

It means that one can implement propagatesFlow(string input, string output, boolean preservesValue) instead of propagatesFlow(string input, string output, boolean preservesValue, Provenance p, boolean isExact, string model), and then get the default values here.

Missing manual models were added using the following code added to `FlowSummaryImpl.qll`: ```ql private predicate testsummaryElement( Input::SummarizedCallableBase c, string namespace, string type, boolean subtypes, string name, string signature, string ext, string originalInput, string originalOutput, string kind, string provenance, string model, boolean isExact ) { exists(string input, string output, Callable baseCallable | summaryModel(namespace, type, subtypes, name, signature, ext, originalInput, originalOutput, kind, provenance, model) and baseCallable = interpretElement(namespace, type, subtypes, name, signature, ext, isExact) and ( c.asCallable() = baseCallable and input = originalInput and output = originalOutput or correspondingKotlinParameterDefaultsArgSpec(baseCallable, c.asCallable(), originalInput, input) and correspondingKotlinParameterDefaultsArgSpec(baseCallable, c.asCallable(), originalOutput, output) ) ) } private predicate testsummaryElement2( string namespace, string type, boolean subtypes, string name, string signature, string ext, string originalInput, string originalOutput, string kind, string provenance, string model ) { exists(Input::SummarizedCallableBase c | testsummaryElement(c, _, _, _, _, _, _, originalInput, originalOutput, kind, provenance, model, false) and testsummaryElement(c, namespace, type, subtypes, name, signature, ext, _, _, _, provenance, _, true) and not testsummaryElement(c, _, _, _, _, _, _, originalInput, originalOutput, kind, provenance, _, true) ) } private string getAMissingManualModel() { exists( string namespace, string type, boolean subtypes, string name, string signature, string ext, string originalInput, string originalOutput, string kind, string provenance, string model | testsummaryElement2(namespace, type, subtypes, name, signature, ext, originalInput, originalOutput, kind, provenance, model) and result = "- [\"" + namespace + "\", \"" + type + "\", True, \"" + name + "\", \"" + signature + "\", \"\", \"" + originalInput + "\", \"" + originalOutput + "\", \"" + kind + "\", \"" + provenance + "\"]" ) } ```

hvitved · 2026-01-21T11:57:35Z

@hvitved : This appears to break the model generator idempotency (at least for C#). I tried generating C# Runtime models from scratch (by first deleting the existing generated models) and then re-generate the model after this (which further changed the models).

@michaelnebel : I have pushed a revert change that appears to fix this: When I run python3 generate_mad.py --language csharp --with-summaries <path to dotnet_runtime_db> twice, I get no changes with the second invocation.

michaelnebel · 2026-01-22T09:59:12Z

csharp/ql/lib/semmle/code/csharp/dataflow/internal/FlowSummaryImpl.qll

+    c.fromSource() and
+    not c.getFile().isStub() and
+    not (
+      c.getFile().extractedQlTest() and


Maybe this deserves a comment (that ql test files where the body is just a throw are considered stub like and thus not a part of the source code).

michaelnebel

Really nice work @hvitved !
Only a couple of minor questions/remarks.

github-actions bot added C# C++ Java Python Go Ruby Rust Pull requests that update Rust code Swift DataFlow Library labels Dec 16, 2025

hvitved force-pushed the shared/flow-summary-provenance-filtering branch 3 times, most recently from a3e585d to eb48820 Compare December 17, 2025 19:45

github-actions bot added the JS label Dec 18, 2025

hvitved force-pushed the shared/flow-summary-provenance-filtering branch from 1e946f8 to 30a0791 Compare December 18, 2025 10:06

This was referenced Jan 5, 2026

C#: Narrow provenance printing in tests. #21094

Closed

Rust: Refactor MaD provenance-based filtering #21072

Merged

hvitved force-pushed the shared/flow-summary-provenance-filtering branch 3 times, most recently from 0fbea88 to 5a2881d Compare January 13, 2026 10:08

github-advanced-security bot found potential problems Jan 13, 2026

View reviewed changes

rust/ql/lib/codeql/rust/dataflow/FlowSummary.qll Fixed Show fixed Hide fixed

hvitved force-pushed the shared/flow-summary-provenance-filtering branch from 5a2881d to a941f4a Compare January 13, 2026 10:59

github-advanced-security bot found potential problems Jan 13, 2026

View reviewed changes

shared/dataflow/codeql/dataflow/internal/FlowSummaryImpl.qll Fixed Show fixed Hide fixed

hvitved force-pushed the shared/flow-summary-provenance-filtering branch 2 times, most recently from bf632b3 to c6383ff Compare January 13, 2026 13:36

github-advanced-security bot found potential problems Jan 13, 2026

View reviewed changes

hvitved force-pushed the shared/flow-summary-provenance-filtering branch 2 times, most recently from 9f81377 to 0057ae3 Compare January 13, 2026 14:43

github-advanced-security bot found potential problems Jan 13, 2026

View reviewed changes

rust/ql/test/library-tests/dataflow/models/models.qlref Fixed Show fixed Hide fixed

hvitved force-pushed the shared/flow-summary-provenance-filtering branch 2 times, most recently from 1933d1c to 72dfe9c Compare January 14, 2026 08:30

owen-mc previously approved these changes Jan 16, 2026

View reviewed changes

michaelnebel reviewed Jan 16, 2026

View reviewed changes

geoffw0 previously approved these changes Jan 16, 2026

View reviewed changes

yoff reviewed Jan 20, 2026

View reviewed changes

hvitved added 11 commits January 21, 2026 11:08

Shared: Provenance-based filtering of flow summaries

f8dbc39

C#: Adapt to changes in FlowSummaryImpl

82b200d

Rust: Adapt to changes in FlowSummaryImpl

7413269

Ruby: Adapt to changes in FlowSummaryImpl

0da5282

Swift: Adapt to changes in FlowSummaryImpl

73cd0e8

Go: Adapt to changes in FlowSummaryImpl

f6d1621

Python: Adapt to changes in FlowSummaryImpl

86454e7

C++: Adapt to changes in FlowSummaryImpl

4c0c899

JS: Adapt to changes in FlowSummaryImpl

cfe1072

Add change notes

4a90f84

hvitved dismissed stale reviews from geoffw0 and owen-mc via 4a90f84 January 21, 2026 10:14

hvitved force-pushed the shared/flow-summary-provenance-filtering branch from 117690f to 4a90f84 Compare January 21, 2026 10:14

C#: Revert change to getASummarizedCallableTarget

e53b4c1

Shared: Shadow hasManualModel in RelevantSummarizedCallable

27c102a

hvitved force-pushed the shared/flow-summary-provenance-filtering branch from 5d74edd to 27c102a Compare January 21, 2026 13:00

hvitved requested review from geoffw0, michaelnebel, owen-mc and yoff January 22, 2026 09:28

michaelnebel reviewed Jan 22, 2026

View reviewed changes

Shared: Provenance-based filtering of flow summaries #21051

Are you sure you want to change the base?

Shared: Provenance-based filtering of flow summaries #21051

Conversation

hvitved commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

owen-mc left a comment

Choose a reason for hiding this comment

Uh oh!

michaelnebel Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

michaelnebel Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

michaelnebel commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

geoffw0 left a comment

Choose a reason for hiding this comment

Uh oh!

hvitved commented Jan 19, 2026

Uh oh!

yoff left a comment

Choose a reason for hiding this comment

Uh oh!

yoff Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

hvitved Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

hvitved commented Jan 21, 2026

Uh oh!

michaelnebel Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

michaelnebel left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

hvitved commented Dec 16, 2025 •

edited

Loading

michaelnebel commented Jan 16, 2026 •

edited

Loading