Skip to content

fix(optimizer): histogram calculations for string (Bytes) data types #17873

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

BohuTANG
Copy link
Member

@BohuTANG BohuTANG commented Apr 30, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

  • Implements histogram calculations for string (Bytes) data types
  • Adds type_name() method to Datum for better error messages
    old: Unsupported datum type: Bytes([56, 46, 48]), Bytes([57, 57, 57, 55])
    now: 1001=>Unsupported datum type for histogram calculation: 8.0 (type: String), 9997 (type: String). Only numeric types are supported.
  • Enables accurate cardinality estimation for string predicates

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • [] No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • [] Other (please describe): only add log

This change is Reviewable

@github-actions github-actions bot added the pr-chore this PR only has small changes that no need to record, like coding styles. label Apr 30, 2025
@BohuTANG BohuTANG added the ci-cloud Build docker image for cloud test label Apr 30, 2025
Copy link
Contributor

Docker Image for PR

  • tag: pr-17873-e4691fc-1746008250

note: this image tag is only available for internal use.

@BohuTANG BohuTANG added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Apr 30, 2025
Copy link
Contributor

Docker Image for PR

  • tag: pr-17873-8775f1a-1746019528

note: this image tag is only available for internal use.

@BohuTANG BohuTANG changed the title chore(optimizer): add more error log for get_upper_bound fix(optimizer): add more error log for get_upper_bound Apr 30, 2025
@BohuTANG BohuTANG added the pr-bugfix this PR patches a bug in codebase label Apr 30, 2025
@BohuTANG BohuTANG changed the title fix(optimizer): add more error log for get_upper_bound fix(optimizer): histogram calculations for string (Bytes) data types Apr 30, 2025
Copy link
Contributor

Docker Image for PR

  • tag: pr-17873-248bfdf-1746022435

note: this image tag is only available for internal use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-cloud Build docker image for cloud test pr-bugfix this PR patches a bug in codebase pr-chore this PR only has small changes that no need to record, like coding styles.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant