Skip to content

Go: Make models-as-data source models for variadic parameters work #18275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Dec 15, 2024

Conversation

owen-mc
Copy link
Contributor

@owen-mc owen-mc commented Dec 12, 2024

Suppose we have a function f(x ...T). When there is a call to f which doesn't have a single argument which is an array/slice of Ts, like f(x0, x1), then we treat it as f([]T{x0, x1}). In other words there is an implicit slice literal. This is needed to make data flow work.

Suppose we want to model arguments which map to its first parameter as a source. It is not currently possible to use Argument[0].ArrayElement because it is a technical limitation of the data flow library that source and sink models can't have access paths. The best we can do is to model Argument[0] as the source. This PR makes this work. The way it does this is to add taint flow whenever there is a container read step, i.e. a step that would remove ArrayElement, Element, MapKey or MapValue from the access path. One example of this is the step from the post-update node of the implicit slice literal to the post-update nodes of its elements.

@github-actions github-actions bot added the Go label Dec 12, 2024
Comment on lines +40 to +43
DataFlowPrivate::readStep(src, c, sink) or
DataFlowPrivate::storeStep(src, c, sink) or
FlowSummaryImpl::Private::Steps::summaryGetterStep(src, c, sink, _) or
FlowSummaryImpl::Private::Steps::summarySetterStep(src, c, sink, _)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't include store steps, just the reads.

Suggested change
DataFlowPrivate::readStep(src, c, sink) or
DataFlowPrivate::storeStep(src, c, sink) or
FlowSummaryImpl::Private::Steps::summaryGetterStep(src, c, sink, _) or
FlowSummaryImpl::Private::Steps::summarySetterStep(src, c, sink, _)
DataFlowPrivate::readStep(src, c, sink) or
FlowSummaryImpl::Private::Steps::summaryGetterStep(src, c, sink, _)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, I guess this isn't the relation that gets passed to global data flow? Then it's fine.

stringConcatStep(pred, succ) or
referenceStep(pred, succ)
or
elementWriteStep(pred, succ)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this step doing here? I realise it isn't added in this PR, but it looks wrong: store steps shouldn't be taint steps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The QLDoc says "Holds if there is an assignment of the form succ[idx] = pred, meaning that pred may taint succ." So when you taint an element of an array, you taint the whole array. (And the same for slices, which are just like arrays.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that you could take a different approach, where only the array element is tainted, and then if the array is used without an array access then hopefully there is an implicit read step that will mean that the whole array is tainted. I could try removing this line and see what effect it has. But I think that shouldn't be part of this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could try removing this line and see what effect it has

I think that should be attempted, yes.

But I think that shouldn't be part of this PR.

Agree.

@owen-mc
Copy link
Contributor Author

owen-mc commented Dec 12, 2024

DCA showed no effect.

@owen-mc owen-mc marked this pull request as ready for review December 12, 2024 13:41
@owen-mc owen-mc requested a review from a team as a code owner December 12, 2024 13:41
aschackmull
aschackmull previously approved these changes Dec 12, 2024
Copy link
Contributor

@aschackmull aschackmull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not inspected the .expected file changes, but the QL LGTM.

// Simple flow through library code is included in the exposed local
// step relation, even though flow is technically inter-procedural
FlowSummaryImpl::Private::Steps::summaryThroughStepTaint(src, sink, _)
or
// Treat container flow as taint for the local taint flow relation
exists(DataFlow::Content c | DataFlowPrivate::containerContent(c) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this defined here and not in localAdditionalTaintStep?

Copy link
Contributor Author

@owen-mc owen-mc Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In increasing order of how good an answer it is:
(a) It's what they do in java.
(b) I copied it before I realised that it wouldn't affect global taint flow at all.
(c) I think it is helpful. Six months ago I tried to convert old-style models for variadic functions to MaD. One of the sticking points was "Some barrier guard definitions use localTaintStep to detect taint flow, but it doesn't work flow through an implicit varargs slice." With this added (for container reads and container writes) we get that localTaint includes flow into and out of arrays, including through implicit varargs slices.

Copy link
Contributor

@smowton smowton Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think I see: you want both reads and stores to appear according to the handy utility localTaintStep, but only reads to be contributed to the additional-taint-step for global dataflow's purposes?

If so, suggest three things:

  1. I note @aschackmull suggested the summary-getter step should be passed to the dataflow lib (i.e. in isAdditionalTaintStep`)
  2. You might as well omit adding the read steps again here since localAdditionalTaintStep is already included -- instead just comment that the read steps are already taken care of
  3. Add a comment explaining why we want read steps given to the dataflow lib but not stores

Copy link
Contributor Author

@owen-mc owen-mc Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I don't think we shoudl do this because the QLDoc for summaryGetterStep says
   * NOTE: This step should not be used in global data-flow/taint-tracking, but may
   * be useful to include in the exposed local data-flow/taint-tracking relations.

Note also that something related is included in readStep (and storeStep).
2. Done.
3. I'm not exactly sure what to say. I've put "Treat container read steps as taint for global taint flow." @aschackmull can you suggest a more detailed explanation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the qldoc that Owen referenced says, the summary-getter step should not be passed to global data flow in any form (neither as a read step nor as a taint step). To elaborate: Container read steps should be passed to data flow as taint steps. But read steps and summary-getter steps are separate things: A read step is atomic, whereas a summary-getter step represents a flow path through a callable with a nested read step, such that the entire thing can be summarized and treated as a read step in those contexts that aren't capable of doing such summarization. And data flow notably is perfectly capable of such summarization, so it only needs to know about the atomic read steps.

sliceStep(pred, succ)
or
// Treat container flow as taint for the local taint flow relation
exists(DataFlow::Content c | DataFlowPrivate::containerContent(c) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What container read steps are not element steps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that I should put this in elementStep? I am happy to do that. If you mean "what does add beyond elementStep" then I'd have to think about it a bit more. It certainly adds flow from an implicit varargs slice to the post-update nodes of the individual args that went into it.

Copy link
Contributor

@smowton smowton Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will certainly duplicate a lot of what elementStep is intending to do -- but clearly "those readSteps that read array values" and "elementStep" are not equivalent, which they probably should be. I'd recommend taking a look at how/why they differ and seeing if elementStep (or perhaps sliceStep) just needs an additional clause?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked and it should be elementStep (access an element of a container) rather than sliceStep (make a slice from an array/slice).
Looking carefully, readStep restricted to container content covers all of elementStep, plus the following:

  1. implicit varargs slice post-update nodes, added in my recent PR (which is what we're trying to get a taint step for)
  2. steps through summary models which read container content. (I think it might be okay to have taint steps for this)
  3. for a = b[i:j], we have a special step reading the array elements of b, which will match up with a corresponding special step storing those elements in a. (I don't think we want taint steps for this - we already have a taint step from b to a in sliceStep.)
  4. channel receive expressions <- chan. (Not sure about this.)
  5. in a ranged for loop for a, b := range c, when c is a map, flow from the map to the key (with MapKey content) and to the value (with MapValue content). (We currently only have a taint step for c to b, which is what makes sense for non-maps. Not sure if adding this would be a good thing.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That suggests to me take care of (1) in elementStep, then (4) can be considered separately

Copy link
Contributor Author

@owen-mc owen-mc Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. In general, I would like the data flow for different languages to be as similar as possible, but we don't need to do that in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm not averse to using readStep to define localAdditionalTaintStep, but I am averse to doing so where there is a somewhat murky venn diagram of steps supplied via readStep and steps supplied directly by the other disjuncts in localAdditionalTaintStep.

@owen-mc
Copy link
Contributor Author

owen-mc commented Dec 13, 2024

I ran QA on this when it added taint steps for all container read steps. Performance looks unchanged. Looking through the results, I see quite a few new results from MapKey in ranged for loops. E.g. the key from the map is used in an SQL query and the value is a prepared argument to the query. I would say many of them are FPs. In any case, I will do another QA run when this PR is approved, as the results will have changed quite a lot.

Copy link
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sensible, contingent on QA

@owen-mc
Copy link
Contributor Author

owen-mc commented Dec 15, 2024

QA looks good. No performance difference. Only one alert difference, which I can't reproduce locally (2 go/incorrect-integer-conversion alerts lost on evilsocket/opensnitch).

@owen-mc owen-mc merged commit 7ab06fc into github:main Dec 15, 2024
14 checks passed
@owen-mc owen-mc deleted the go/mad/variadic-params-sources branch December 15, 2024 13:22
dbartol added a commit that referenced this pull request Jan 7, 2025
…-sources"

This reverts commit 7ab06fc, reversing
changes made to 0c5e260.
@dbartol dbartol mentioned this pull request Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants