Skip to content

Remove redundant rows from subscription updates #2654

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

joshua-spacetime
Copy link
Collaborator

@joshua-spacetime joshua-spacetime commented Apr 22, 2025

Description of Changes

Avoids sending trivially empty subscription updates to clients. That is, if a row is inserted n times and deleted n times, we remove it from the result set to avoid network and client side deserialization costs.

API and ABI breaking changes

None

Expected complexity level and risk

1

Testing

  • Current regression tests pass
  • Bot test

@joshua-spacetime joshua-spacetime force-pushed the joshua/perf/subscription-path-pruning branch from aaea2d7 to 915b5b5 Compare April 22, 2025 19:33
@joshua-spacetime joshua-spacetime force-pushed the joshua/perf/subscriptions/trivially-empty-updates branch from 11c0da3 to 98ca863 Compare April 22, 2025 19:34
@joshua-spacetime joshua-spacetime requested a review from mamcx April 22, 2025 19:35
Copy link
Contributor

@jsdt jsdt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is happening a lot for joins, I assume that is because there are a lot of RHS rows that are modified without falling out of the query range. IIUC, in that case, we will still end up doing the work of joining the new row with the LHS table and joining the old row with the LHS table, only to dedupe the resulting rows from the LHS.

How difficult would it be to push this deduplication into the RHS of the query plan (before the join)? For example, given the query in the test (select u.* from u join v on u.i = v.i where v.x = 5), if we started by computing the delta of select v.i from v where v.x = 5, then joined that with u, we wouldn't need to do any index lookups from u. We also would be deduplicating with the field used for the join, which should be cheaper to hash than the whole result set rows.

@joshua-spacetime
Copy link
Collaborator Author

@jsdt

How difficult would it be to push this deduplication into the RHS of the query plan

This would just require:

  1. evaluating inserts and deletes in a single plan (currently they're evaluated separately), and
  2. introducing some new operators and optimizations to the query planner

It's not complicated, but certainly not as trivial as this patch.

We also would be deduplicating with the field used for the join, which should be cheaper to hash than the whole result set rows.

For integers yes, hashing the field would be cheaper, however keep in mind that these are row ids, so it's not the same as hashing a product value for instance.

@joshua-spacetime joshua-spacetime force-pushed the joshua/perf/subscriptions/trivially-empty-updates branch from 33f3de2 to 83c951a Compare April 23, 2025 16:53
@joshua-spacetime joshua-spacetime force-pushed the joshua/perf/subscription-path-pruning branch from fc31eeb to bacb374 Compare April 23, 2025 16:54
@joshua-spacetime joshua-spacetime force-pushed the joshua/perf/subscriptions/trivially-empty-updates branch from 83c951a to b44291f Compare April 23, 2025 16:55
@joshua-spacetime joshua-spacetime requested a review from mamcx April 23, 2025 20:06
Copy link
Contributor

@mamcx mamcx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@joshua-spacetime joshua-spacetime force-pushed the joshua/perf/subscription-path-pruning branch from bacb374 to 467ac13 Compare April 23, 2025 23:25
@joshua-spacetime joshua-spacetime self-assigned this Apr 24, 2025
Base automatically changed from joshua/perf/subscription-path-pruning to master April 24, 2025 00:15
@joshua-spacetime joshua-spacetime force-pushed the joshua/perf/subscriptions/trivially-empty-updates branch from af65894 to 6bf16a5 Compare April 24, 2025 00:21
@joshua-spacetime joshua-spacetime added this pull request to the merge queue Apr 24, 2025
Merged via the queue into master with commit 59faab8 Apr 24, 2025
19 checks passed
@joshua-spacetime joshua-spacetime deleted the joshua/perf/subscriptions/trivially-empty-updates branch April 24, 2025 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants