fix: Image Feature in Datasets Library Fails to Handle bytearray Objects from Spark DataFrames (#7517) #7521
+47
−10
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Task
Support bytes-like objects (bytes and bytearray) in Features classes
Description
The
Features
classes only acceptbytes
objects for binary data, but notbytearray
. This leads to errors when usingIterableDataset.from_spark()
with Spark DataFrames as they containbytearray
objects, even though bothbytes
andbytearray
are valid bytes-like objects in Python.Changes
Features
classes to accept bothbytes
andbytearray
types for binary data fields.Reasoning
bytes
andbytearray
serve the same purpose for binary data, with the only difference being mutability.bytearray
tobytes
would be a workaround, not a true fix. I think the correct solution is to accept all bytes-like objects as input.Testing
bytearray
inputs for image features.Related Issues