Skip to content

Conversation

@hvitved
Copy link
Contributor

@hvitved hvitved commented Dec 16, 2025

This PR aligns the logic across languages for how flow summaries are prioritized based on provenance and exactness (that is, whether a model is defined directly for a function or for a function that is implemented/overridden).

A flow summary is considered relevant if:

  1. It is manual exact model, or
  2. It is a manual inexact model and there is no exact manual (neutral) model, or
  3. It is a generated model and (a) there is no source code available for the modeled callable, (b) there is no manual (neutral) model, and (c) the model is inexact and there is no generated exact (neutral) model.

Note that for dynamic languages we currently pretend that no source code is available for functions with flow summaries, so 3.(a) holds vacuously.

Points 2 and 3.c represent a change for e.g. Java, where we would previously union exact and inexact models, which meant that it was not possible to overrule inexact models. As a consequence, some inexact manual have been replicated. DCA for Java reports some lost java/sensitive-log results on apache_solr, but looking at those results, they all have flow paths of length > 150, so they are almost certainly false positives, and most likely a consequence of 3.c.

In order for the logic to be defined in the shared flow summary library, I had to move provenance and exactness information into the propagatesFlow predicate, which is a breaking change.

Lastly, I have applied the ::Range pattern to the SummarizedCallable class for all languages except C++, which currently does not expose this class. This means that SummarizedCallable::Range will contain all flow summaries, whereas SummarizedCallable will only contain relevant summaries.

@hvitved hvitved force-pushed the shared/flow-summary-provenance-filtering branch 3 times, most recently from a3e585d to eb48820 Compare December 17, 2025 19:45
@github-actions github-actions bot added the JS label Dec 18, 2025
@hvitved hvitved force-pushed the shared/flow-summary-provenance-filtering branch from 1e946f8 to 30a0791 Compare December 18, 2025 10:06
@hvitved hvitved force-pushed the shared/flow-summary-provenance-filtering branch 3 times, most recently from 0fbea88 to 5a2881d Compare January 13, 2026 10:08
@hvitved hvitved force-pushed the shared/flow-summary-provenance-filtering branch from 5a2881d to a941f4a Compare January 13, 2026 10:59
@hvitved hvitved force-pushed the shared/flow-summary-provenance-filtering branch 2 times, most recently from bf632b3 to c6383ff Compare January 13, 2026 13:36
@hvitved hvitved force-pushed the shared/flow-summary-provenance-filtering branch 2 times, most recently from 9f81377 to 0057ae3 Compare January 13, 2026 14:43
@hvitved hvitved force-pushed the shared/flow-summary-provenance-filtering branch 2 times, most recently from 1933d1c to 72dfe9c Compare January 14, 2026 08:30
owen-mc
owen-mc previously approved these changes Jan 16, 2026
Copy link
Contributor

@owen-mc owen-mc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go LGTM. It's great to get this logic shared - it was a bit worrying that different implementations were drifting apart. And it's one less thing to have to think about when supporting a new language.

Comment on lines +489 to +505
if p.isGenerated() or isExact = false
then
// Only apply generated models to functions in library code
not (p.isGenerated() and callableFromSource(c)) and
// Only apply generated or inexact models when no strictly better model exists
not exists(Provenance other, boolean isExactOther |
c.propagatesFlow(_, _, _, other, isExactOther, _)
or
neutralElement(c, "summary", other, isExactOther)
|
p.isGenerated() and other.isManual()
or
p.getVerification() = other.getVerification() and
isExact = false and
isExactOther = true
)
else any()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe consider something like:

      p.isManual() and isExact = true
      or
      // Only apply generated models to functions in library code
      not (p.isGenerated() and callableFromSource(c)) and
      // Only apply generated or inexact models when no strictly better model exists
      not exists(Provenance other, boolean isExactOther |
        c.propagatesFlow(_, _, _, other, isExactOther, _)
        or
        neutralElement(c, "summary", other, isExactOther)
      |
        p.isGenerated() and other.isManual()
        or
        p.getVerification() = other.getVerification() and
        isExact = false and
        isExactOther = true
      )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or if the above suggestion doesn't perform, then invert the condition (then it is more aligned with the comment explaining the logic).

if p.isManual() and isExact = true
then any()
else
 ...

@michaelnebel
Copy link
Contributor

michaelnebel commented Jan 16, 2026

@hvitved : This appears to break the model generator idempotency (at least for C#). I tried generating C# Runtime models from scratch (by first deleting the existing generated models) and then re-generate the model after this (which further changed the models).

This is a very nice change!

geoffw0
geoffw0 previously approved these changes Jan 16, 2026
Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust, Swift changes and DCA LGTM.

@hvitved
Copy link
Contributor Author

hvitved commented Jan 19, 2026

@hvitved : This appears to break the model generator idempotency (at least for C#). I tried generating C# Runtime models from scratch (by first deleting the existing generated models) and then re-generate the model after this (which further changed the models).

Oh no... Thanks for checking.

Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question, otherwise both the code changes and DCA run looks good for Python.

string model
) {
this.propagatesFlow(input, output, preservesValue) and
p = "manual" and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should only manual summaries propagate flow? (Or am I misunderstanding the code?)
I think we only have manual summaries so far, but if we ever get to generate some, I think we will not see their effect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that one can implement propagatesFlow(string input, string output, boolean preservesValue) instead of propagatesFlow(string input, string output, boolean preservesValue, Provenance p, boolean isExact, string model), and then get the default values here.

Missing manual models were added using the following code added to `FlowSummaryImpl.qll`:

```ql
    private predicate testsummaryElement(
      Input::SummarizedCallableBase c, string namespace, string type, boolean subtypes, string name,
      string signature, string ext, string originalInput, string originalOutput, string kind,
      string provenance, string model, boolean isExact
    ) {
      exists(string input, string output, Callable baseCallable |
        summaryModel(namespace, type, subtypes, name, signature, ext, originalInput, originalOutput,
          kind, provenance, model) and
        baseCallable = interpretElement(namespace, type, subtypes, name, signature, ext, isExact) and
        (
          c.asCallable() = baseCallable and input = originalInput and output = originalOutput
          or
          correspondingKotlinParameterDefaultsArgSpec(baseCallable, c.asCallable(), originalInput,
            input) and
          correspondingKotlinParameterDefaultsArgSpec(baseCallable, c.asCallable(), originalOutput,
            output)
        )
      )
    }

    private predicate testsummaryElement2(
      string namespace, string type, boolean subtypes, string name, string signature, string ext,
      string originalInput, string originalOutput, string kind, string provenance, string model
    ) {
      exists(Input::SummarizedCallableBase c |
        testsummaryElement(c, _, _, _, _, _, _, originalInput, originalOutput, kind, provenance,
          model, false) and
        testsummaryElement(c, namespace, type, subtypes, name, signature, ext, _, _, _, provenance,
          _, true) and
        not testsummaryElement(c, _, _, _, _, _, _, originalInput, originalOutput, kind, provenance,
          _, true)
      )
    }

    private string getAMissingManualModel() {
      exists(
        string namespace, string type, boolean subtypes, string name, string signature, string ext,
        string originalInput, string originalOutput, string kind, string provenance, string model
      |
        testsummaryElement2(namespace, type, subtypes, name, signature, ext, originalInput,
          originalOutput, kind, provenance, model) and
        result =
          "- [\"" + namespace + "\", \"" + type + "\", True, \"" + name + "\", \"" + signature +
            "\", \"\", \"" + originalInput + "\", \"" + originalOutput + "\", \"" + kind + "\", \"" +
            provenance + "\"]"
      )
    }
```
@hvitved hvitved dismissed stale reviews from geoffw0 and owen-mc via 4a90f84 January 21, 2026 10:14
@hvitved hvitved force-pushed the shared/flow-summary-provenance-filtering branch from 117690f to 4a90f84 Compare January 21, 2026 10:14
@hvitved
Copy link
Contributor Author

hvitved commented Jan 21, 2026

@hvitved : This appears to break the model generator idempotency (at least for C#). I tried generating C# Runtime models from scratch (by first deleting the existing generated models) and then re-generate the model after this (which further changed the models).

@michaelnebel : I have pushed a revert change that appears to fix this: When I run python3 generate_mad.py --language csharp --with-summaries <path to dotnet_runtime_db> twice, I get no changes with the second invocation.

@hvitved hvitved force-pushed the shared/flow-summary-provenance-filtering branch from 5d74edd to 27c102a Compare January 21, 2026 13:00
c.fromSource() and
not c.getFile().isStub() and
not (
c.getFile().extractedQlTest() and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this deserves a comment (that ql test files where the body is just a throw are considered stub like and thus not a part of the source code).

Copy link
Contributor

@michaelnebel michaelnebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work @hvitved !
Only a couple of minor questions/remarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants