Skip to content

Add a more efficient implementation for String(::ReinterpretArray) #58438

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

lntricate1
Copy link

I added a new method String(::ReinterpretArray{UInt8, 1, S, Vector{S}, IsReshaped}), which is a copy of String(::Vector{UInt8}) but using v.parent instead of v. I'm not sure if the same can be done for more generic ReinterpretArrays or for other AbstractVectors, I'd like some input on that if possible.

Previous behavior:

function test()
  a = reinterpret(UInt8, rand(UInt32, 2^27))
  b = copy(a)
  @time A = String(a) # 1.133837 seconds (2 allocations: 512.000 MiB, 0.06% gc time)
  @time B = String(b) # 0.586990 seconds (1 allocation: 512.000 MiB)
  A == B # true
end

New behavior:

function test()
  a = reinterpret(UInt8, rand(UInt32, 2^27))
  b = copy(a)
  @time A = String(a) # 0.458837 seconds (1 allocation: 512.000 MiB)
  @time B = String(b) # 0.475022 seconds (1 allocation: 512.000 MiB)
  A == B # true
end

@giordano giordano added performance Must go faster strings "Strings!" labels May 16, 2025
Comment on lines 80 to 93
function String(v::ReinterpretArray{UInt8, 1, S, Vector{S}, IsReshaped}) where {S, IsReshaped}
len = length(v)
len == 0 && return ""
ref = v.parent.ref
if ref.ptr_or_offset == ref.mem.ptr
str = ccall(:jl_genericmemory_to_string, Ref{String}, (Any, Int), ref.mem, len)
else
str = ccall(:jl_pchar_to_string, Ref{String}, (Ptr{S}, Int), ref, len)
end
# optimized empty!(v.parent); sizehint!(v.parent, 0) calls
setfield!(v.parent, :size, (0,))
setfield!(v.parent, :ref, memoryref(Memory{S}()))
return str
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is invalid, because String(::AbstractVector) is documented to copy, unless the vector is a Vector. So you'd need to make a stringvector, then copy to it.

Suggested change
function String(v::ReinterpretArray{UInt8, 1, S, Vector{S}, IsReshaped}) where {S, IsReshaped}
len = length(v)
len == 0 && return ""
ref = v.parent.ref
if ref.ptr_or_offset == ref.mem.ptr
str = ccall(:jl_genericmemory_to_string, Ref{String}, (Any, Int), ref.mem, len)
else
str = ccall(:jl_pchar_to_string, Ref{String}, (Ptr{S}, Int), ref, len)
end
# optimized empty!(v.parent); sizehint!(v.parent, 0) calls
setfield!(v.parent, :size, (0,))
setfield!(v.parent, :ref, memoryref(Memory{S}()))
return str
end
function String(v::ReinterpretArray{UInt8, 1, S, Vector{S}, IsReshaped}) where {S, IsReshaped}
len = length(v)
iszero(len) && return ""
par = parent(v)
mem = StringMemory(len % UInt)
GC.@preserve mem par begin
unsafe_copyto!(pointer(mem), Ptr{UInt8}(pointer(par)), len % UInt)
end
return unsafe_takestring(mem)
end

Another issue is whether you can make the signature broader - what about reinterprets of Memory?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented this fix, but using jl_pchar_to_string (which always copies) to remove the extra allocation from StringMemory. I also implemented it for reinterprets of Memory.

This has made me realize that the docstring for String could be clearer. Perhaps splitting up the docstring for Vector{UInt8} and AbstractVector{UInt8} would be good. There's also a few missing entries that might be good to have (like the one for AbstractVector{Char}. Do you have any thoughts on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Better to do that in a separate PR, in my opinion.

@jakobnissen jakobnissen added the merge me PR is reviewed. Merge when all tests are passing label May 20, 2025
@giordano
Copy link
Contributor

Are the new methods covered by tests already?

@jakobnissen jakobnissen removed the merge me PR is reviewed. Merge when all tests are passing label May 20, 2025
@jakobnissen
Copy link
Member

Doesn't seems so - looks like I was too quick with the merge me label.

@lntricate1
Copy link
Author

Not sure if these tests are sufficient or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster strings "Strings!"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants