Skip to content

Add whitespace handling methods to MemoryExtensions #111439

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

AlexRadch
Copy link
Contributor

Part of #77959 except ContainsAnyExceptWhiteSpace

Updated the MemoryExtensions class with new methods for efficient whitespace handling, including ContainsAnyWhiteSpace, IndexOfAnyWhiteSpace, and LastIndexOfAnyWhiteSpace.

Optimized the IsWhiteSpace method to utilize SearchValues.WhiteSpaces.

Added corresponding test cases in StringSearchValues.cs to validate the new functionality.

Updated the MemoryExtensions class with new methods for
efficient whitespace handling, including ContainsAnyWhiteSpace,
IndexOfAnyWhiteSpace, and LastIndexOfAnyWhiteSpace.

Optimized the IsWhiteSpace method to utilize SearchValues.WhiteSpaces.

Added corresponding test cases in StringSearchValues.cs to validate
the new functionality.
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

1 similar comment
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jan 14, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

Copy link
Member

@MihaZupan MihaZupan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A concern we had with the implementation here was whether it's worth paying the overhead of taking a vectorized path if we expect the match to commonly be close to the start.

For example it's not obvious why span.IsWhiteSpace should use a different strategy compared to string.IsNullOrWhiteSpace, where the 99% case is that you're validating an argument and expect it have non-whitespace chars (often already with the first character).

Have you done any perf measurements for such cases?

@AlexRadch
Copy link
Contributor Author

A concern we had with the implementation here was whether it's worth paying the overhead of taking a vectorized path if we expect the match to commonly be close to the start.

For example it's not obvious why span.IsWhiteSpace should use a different strategy compared to string.IsNullOrWhiteSpace, where the 99% case is that you're validating an argument and expect it have non-whitespace chars (often already with the first character).

Have you done any perf measurements for such cases?

I haven't conducted performance measurements for such cases yet, but I’d be happy to write and run performance tests if we can determine the most relevant data set for the evaluation. Could you clarify what types of inputs or scenarios would best reflect the use cases you're concerned about? For example, should we focus on inputs where the match is typically close to the start, or would you like a broader range of cases? Your guidance will help ensure the tests provide meaningful insights.

@AlexRadch
Copy link
Contributor Author

AlexRadch commented Jan 15, 2025

Have you done any perf measurements for such cases?

I write benchmarks for the ROSpan IsWhiteSpace method.

On my computer, I got the next results

| Method       | input                                | Mean      | Error     | StdDev    | Median    | Min       | Max       | Ratio | RatioSD | Allocated | Alloc Ratio |
|------------- |------------------------------------- |----------:|----------:|----------:|----------:|----------:|----------:|------:|--------:|----------:|------------:|
| IsWhiteSpace |                                      |  4.761 ns | 0.1679 ns | 0.1933 ns |  4.788 ns |  4.410 ns |  5.123 ns |  1.00 |    0.06 |         - |          NA |
| IsWhiteSpace |                                      |  5.877 ns | 0.2250 ns | 0.2591 ns |  5.919 ns |  5.399 ns |  6.246 ns |  1.24 |    0.07 |         - |          NA |
|              |                                      |           |           |           |           |           |           |       |         |           |             |
| IsWhiteSpace |              (...)    32abcdefg [41] | 52.855 ns | 1.0784 ns | 1.0088 ns | 52.629 ns | 51.269 ns | 54.761 ns |  1.00 |    0.03 |         - |          NA |
| IsWhiteSpace |              (...)    32abcdefg [41] | 14.651 ns | 0.5130 ns | 0.5908 ns | 14.515 ns | 13.822 ns | 16.039 ns |  0.28 |    0.01 |         - |          NA |
|              |                                      |           |           |           |           |           |           |       |         |           |             |
| IsWhiteSpace |                 16abcdefg            | 24.343 ns | 0.7555 ns | 0.8700 ns | 24.483 ns | 22.740 ns | 26.364 ns |  1.00 |    0.05 |         - |          NA |
| IsWhiteSpace |                 16abcdefg            | 13.210 ns | 0.3936 ns | 0.4532 ns | 13.163 ns | 12.455 ns | 14.142 ns |  0.54 |    0.03 |         - |          NA |
|              |                                      |           |           |           |           |           |           |       |         |           |             |
| IsWhiteSpace |             12abcdefg                | 19.221 ns | 1.3650 ns | 1.5719 ns | 18.667 ns | 17.769 ns | 23.535 ns |  1.01 |    0.11 |         - |          NA |
| IsWhiteSpace |             12abcdefg                | 10.972 ns | 0.2484 ns | 0.2324 ns | 10.952 ns | 10.500 ns | 11.372 ns |  0.57 |    0.04 |         - |          NA |
|              |                                      |           |           |           |           |           |           |       |         |           |             |
| IsWhiteSpace |            11abcdefg                 | 18.358 ns | 0.8649 ns | 0.9961 ns | 18.435 ns | 16.280 ns | 20.075 ns |  1.00 |    0.08 |         - |          NA |
| IsWhiteSpace |            11abcdefg                 | 11.296 ns | 0.3239 ns | 0.3730 ns | 11.176 ns | 10.779 ns | 11.922 ns |  0.62 |    0.04 |         - |          NA |
|              |                                      |           |           |           |           |           |           |       |         |           |             |
| IsWhiteSpace |           10abcdefg                  | 17.215 ns | 0.9891 ns | 1.1391 ns | 16.960 ns | 15.590 ns | 20.060 ns |  1.00 |    0.09 |         - |          NA |
| IsWhiteSpace |           10abcdefg                  | 10.891 ns | 0.6447 ns | 0.7425 ns | 10.939 ns |  9.877 ns | 12.178 ns |  0.64 |    0.06 |         - |          NA |
|              |                                      |           |           |           |           |           |           |       |         |           |             |
| IsWhiteSpace |          9abcdefg                    | 14.797 ns | 0.8616 ns | 0.9922 ns | 15.002 ns | 13.241 ns | 16.140 ns |  1.00 |    0.10 |         - |          NA |
| IsWhiteSpace |          9abcdefg                    | 10.241 ns | 0.2714 ns | 0.3125 ns | 10.181 ns |  9.784 ns | 10.813 ns |  0.70 |    0.05 |         - |          NA |
|              |                                      |           |           |           |           |           |           |       |         |           |             |
| IsWhiteSpace |         8abcdefg                     | 12.351 ns | 0.2815 ns | 0.2765 ns | 12.262 ns | 12.084 ns | 12.975 ns |  1.00 |    0.03 |         - |          NA |
| IsWhiteSpace |         8abcdefg                     | 12.407 ns | 1.1369 ns | 1.3093 ns | 11.794 ns | 11.019 ns | 14.931 ns |  1.00 |    0.11 |         - |          NA |
|              |                                      |           |           |           |           |           |           |       |         |           |             |
| IsWhiteSpace |        7abcdefg                      | 12.543 ns | 0.6078 ns | 0.7000 ns | 12.713 ns | 11.215 ns | 13.491 ns |  1.00 |    0.08 |         - |          NA |
| IsWhiteSpace |        7abcdefg                      | 12.475 ns | 1.3318 ns | 1.5337 ns | 11.514 ns | 10.826 ns | 14.942 ns |  1.00 |    0.13 |         - |          NA |
|              |                                      |           |           |           |           |           |           |       |         |           |             |
| IsWhiteSpace |       6abcdefg                       | 10.165 ns | 0.2281 ns | 0.2240 ns | 10.106 ns |  9.803 ns | 10.672 ns |  1.00 |    0.03 |         - |          NA |
| IsWhiteSpace |       6abcdefg                       | 14.388 ns | 1.4552 ns | 1.6758 ns | 14.998 ns | 11.494 ns | 16.083 ns |  1.42 |    0.16 |         - |          NA |
|              |                                      |           |           |           |           |           |           |       |         |           |             |
| IsWhiteSpace |      5abcdefg                        | 10.456 ns | 0.5675 ns | 0.6535 ns | 10.476 ns |  9.277 ns | 11.882 ns |  1.00 |    0.09 |         - |          NA |
| IsWhiteSpace |      5abcdefg                        | 11.988 ns | 0.3116 ns | 0.3589 ns | 12.053 ns | 11.318 ns | 12.722 ns |  1.15 |    0.08 |         - |          NA |
|              |                                      |           |           |           |           |           |           |       |         |           |             |
| IsWhiteSpace |     4abcdefg                         |  9.293 ns | 0.0678 ns | 0.0634 ns |  9.293 ns |  9.196 ns |  9.408 ns |  1.00 |    0.01 |         - |          NA |
| IsWhiteSpace |     4abcdefg                         | 11.360 ns | 0.2698 ns | 0.3107 ns | 11.351 ns | 10.873 ns | 11.997 ns |  1.22 |    0.03 |         - |          NA |
|              |                                      |           |           |           |           |           |           |       |         |           |             |
| IsWhiteSpace |   2abcdefg                           |  6.471 ns | 0.5033 ns | 0.5796 ns |  6.335 ns |  5.359 ns |  7.367 ns |  1.01 |    0.13 |         - |          NA |
| IsWhiteSpace |   2abcdefg                           | 11.145 ns | 0.2521 ns | 0.2475 ns | 11.035 ns | 10.809 ns | 11.626 ns |  1.74 |    0.16 |         - |          NA |
|              |                                      |           |           |           |           |           |           |       |         |           |             |
| IsWhiteSpace |  1abcdefg                            |  5.333 ns | 0.2550 ns | 0.2936 ns |  5.223 ns |  4.996 ns |  6.234 ns |  1.00 |    0.07 |         - |          NA |
| IsWhiteSpace |  1abcdefg                            | 10.858 ns | 0.2394 ns | 0.2239 ns | 10.833 ns | 10.642 ns | 11.535 ns |  2.04 |    0.11 |         - |          NA |
|              |                                      |           |           |           |           |           |           |       |         |           |             |
| IsWhiteSpace | 0abcdefg                             |  4.752 ns | 0.1917 ns | 0.2207 ns |  4.761 ns |  4.471 ns |  5.462 ns |  1.00 |    0.06 |         - |          NA |
| IsWhiteSpace | 0abcdefg                             | 10.900 ns | 0.2861 ns | 0.3295 ns | 10.778 ns | 10.541 ns | 11.714 ns |  2.30 |    0.12 |         - |          NA |

`string.SearchValuesStorage.WhiteSpaceChars` in the
`MemoryExtensions` class methods.
This commit removes the `using System.Buffers;` directive from the `MemoryExtensions.Globalization.cs` file, indicating that the functionality provided by the `System.Buffers` namespace is no longer needed or has been replaced by alternative implementations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Memory community-contribution Indicates that the PR has been added by a community member new-api-needs-documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants