-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Go: Make models-as-data source models for variadic parameters work #18275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
DataFlowPrivate::readStep(src, c, sink) or | ||
DataFlowPrivate::storeStep(src, c, sink) or | ||
FlowSummaryImpl::Private::Steps::summaryGetterStep(src, c, sink, _) or | ||
FlowSummaryImpl::Private::Steps::summarySetterStep(src, c, sink, _) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't include store steps, just the reads.
DataFlowPrivate::readStep(src, c, sink) or | |
DataFlowPrivate::storeStep(src, c, sink) or | |
FlowSummaryImpl::Private::Steps::summaryGetterStep(src, c, sink, _) or | |
FlowSummaryImpl::Private::Steps::summarySetterStep(src, c, sink, _) | |
DataFlowPrivate::readStep(src, c, sink) or | |
FlowSummaryImpl::Private::Steps::summaryGetterStep(src, c, sink, _) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, I guess this isn't the relation that gets passed to global data flow? Then it's fine.
stringConcatStep(pred, succ) or | ||
referenceStep(pred, succ) | ||
or | ||
elementWriteStep(pred, succ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this step doing here? I realise it isn't added in this PR, but it looks wrong: store steps shouldn't be taint steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The QLDoc says "Holds if there is an assignment of the form succ[idx] = pred
, meaning that pred
may taint succ
." So when you taint an element of an array, you taint the whole array. (And the same for slices, which are just like arrays.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see that you could take a different approach, where only the array element is tainted, and then if the array is used without an array access then hopefully there is an implicit read step that will mean that the whole array is tainted. I could try removing this line and see what effect it has. But I think that shouldn't be part of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could try removing this line and see what effect it has
I think that should be attempted, yes.
But I think that shouldn't be part of this PR.
Agree.
DCA showed no effect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not inspected the .expected
file changes, but the QL LGTM.
// Simple flow through library code is included in the exposed local | ||
// step relation, even though flow is technically inter-procedural | ||
FlowSummaryImpl::Private::Steps::summaryThroughStepTaint(src, sink, _) | ||
or | ||
// Treat container flow as taint for the local taint flow relation | ||
exists(DataFlow::Content c | DataFlowPrivate::containerContent(c) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this defined here and not in localAdditionalTaintStep?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In increasing order of how good an answer it is:
(a) It's what they do in java.
(b) I copied it before I realised that it wouldn't affect global taint flow at all.
(c) I think it is helpful. Six months ago I tried to convert old-style models for variadic functions to MaD. One of the sticking points was "Some barrier guard definitions use localTaintStep to detect taint flow, but it doesn't work flow through an implicit varargs slice." With this added (for container reads and container writes) we get that localTaint
includes flow into and out of arrays, including through implicit varargs slices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think I see: you want both reads and stores to appear according to the handy utility localTaintStep
, but only reads to be contributed to the additional-taint-step for global dataflow's purposes?
If so, suggest three things:
- I note @aschackmull suggested the summary-getter step should be passed to the dataflow lib (i.e. in isAdditionalTaintStep`)
- You might as well omit adding the read steps again here since
localAdditionalTaintStep
is already included -- instead just comment that the read steps are already taken care of - Add a comment explaining why we want read steps given to the dataflow lib but not stores
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I don't think we shoudl do this because the QLDoc for
summaryGetterStep
says
* NOTE: This step should not be used in global data-flow/taint-tracking, but may * be useful to include in the exposed local data-flow/taint-tracking relations.
Note also that something related is included in readStep
(and storeStep
).
2. Done.
3. I'm not exactly sure what to say. I've put "Treat container read steps as taint for global taint flow." @aschackmull can you suggest a more detailed explanation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the qldoc that Owen referenced says, the summary-getter step should not be passed to global data flow in any form (neither as a read step nor as a taint step). To elaborate: Container read steps should be passed to data flow as taint steps. But read steps and summary-getter steps are separate things: A read step is atomic, whereas a summary-getter step represents a flow path through a callable with a nested read step, such that the entire thing can be summarized and treated as a read step in those contexts that aren't capable of doing such summarization. And data flow notably is perfectly capable of such summarization, so it only needs to know about the atomic read steps.
sliceStep(pred, succ) | ||
or | ||
// Treat container flow as taint for the local taint flow relation | ||
exists(DataFlow::Content c | DataFlowPrivate::containerContent(c) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What container read steps are not element steps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that I should put this in elementStep
? I am happy to do that. If you mean "what does add beyond elementStep
" then I'd have to think about it a bit more. It certainly adds flow from an implicit varargs slice to the post-update nodes of the individual args that went into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will certainly duplicate a lot of what elementStep
is intending to do -- but clearly "those readSteps that read array values" and "elementStep" are not equivalent, which they probably should be. I'd recommend taking a look at how/why they differ and seeing if elementStep (or perhaps sliceStep) just needs an additional clause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked and it should be elementStep
(access an element of a container) rather than sliceStep
(make a slice from an array/slice).
Looking carefully, readStep
restricted to container content covers all of elementStep
, plus the following:
- implicit varargs slice post-update nodes, added in my recent PR (which is what we're trying to get a taint step for)
- steps through summary models which read container content. (I think it might be okay to have taint steps for this)
- for
a = b[i:j]
, we have a special step reading the array elements ofb
, which will match up with a corresponding special step storing those elements ina
. (I don't think we want taint steps for this - we already have a taint step fromb
toa
insliceStep
.) - channel receive expressions
<- chan
. (Not sure about this.) - in a ranged for loop
for a, b := range c
, whenc
is a map, flow from the map to the key (with MapKey content) and to the value (with MapValue content). (We currently only have a taint step forc
tob
, which is what makes sense for non-maps. Not sure if adding this would be a good thing.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That suggests to me take care of (1) in elementStep, then (4) can be considered separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. In general, I would like the data flow for different languages to be as similar as possible, but we don't need to do that in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm not averse to using readStep
to define localAdditionalTaintStep
, but I am averse to doing so where there is a somewhat murky venn diagram of steps supplied via readStep
and steps supplied directly by the other disjuncts in localAdditionalTaintStep
.
I ran QA on this when it added taint steps for all container read steps. Performance looks unchanged. Looking through the results, I see quite a few new results from MapKey in ranged for loops. E.g. the key from the map is used in an SQL query and the value is a prepared argument to the query. I would say many of them are FPs. In any case, I will do another QA run when this PR is approved, as the results will have changed quite a lot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks sensible, contingent on QA
QA looks good. No performance difference. Only one alert difference, which I can't reproduce locally (2 |
Suppose we have a function
f(x ...T)
. When there is a call to f which doesn't have a single argument which is an array/slice ofT
s, likef(x0, x1)
, then we treat it asf([]T{x0, x1})
. In other words there is an implicit slice literal. This is needed to make data flow work.Suppose we want to model arguments which map to its first parameter as a source. It is not currently possible to use
Argument[0].ArrayElement
because it is a technical limitation of the data flow library that source and sink models can't have access paths. The best we can do is to modelArgument[0]
as the source. This PR makes this work. The way it does this is to add taint flow whenever there is a container read step, i.e. a step that would removeArrayElement
,Element
,MapKey
orMapValue
from the access path. One example of this is the step from the post-update node of the implicit slice literal to the post-update nodes of its elements.