-
Notifications
You must be signed in to change notification settings - Fork 486
Remove redundant rows from subscription updates #2654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove redundant rows from subscription updates #2654
Conversation
aaea2d7
to
915b5b5
Compare
11c0da3
to
98ca863
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is happening a lot for joins, I assume that is because there are a lot of RHS rows that are modified without falling out of the query range. IIUC, in that case, we will still end up doing the work of joining the new row with the LHS table and joining the old row with the LHS table, only to dedupe the resulting rows from the LHS.
How difficult would it be to push this deduplication into the RHS of the query plan (before the join)? For example, given the query in the test (select u.* from u join v on u.i = v.i where v.x = 5
), if we started by computing the delta of select v.i from v where v.x = 5
, then joined that with u
, we wouldn't need to do any index lookups from u
. We also would be deduplicating with the field used for the join, which should be cheaper to hash than the whole result set rows.
This would just require:
It's not complicated, but certainly not as trivial as this patch.
For integers yes, hashing the field would be cheaper, however keep in mind that these are row ids, so it's not the same as hashing a product value for instance. |
33f3de2
to
83c951a
Compare
fc31eeb
to
bacb374
Compare
83c951a
to
b44291f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
bacb374
to
467ac13
Compare
af65894
to
6bf16a5
Compare
Description of Changes
Avoids sending trivially empty subscription updates to clients. That is, if a row is inserted
n
times and deletedn
times, we remove it from the result set to avoid network and client side deserialization costs.API and ABI breaking changes
None
Expected complexity level and risk
1
Testing