Skip to content

DataFrame doesn't decode boolean arrays correctly from Arrow #7115

Open
@johnnyg

Description

@johnnyg

System Information (please complete the following information):

  • OS & Version: Windows 10
  • ML.NET Version: 0.21.1
  • .NET Version: .NET 5.0

Describe the bug
Creating a dataframe from an arrow record batch where a column is a boolean array produces incorrect results (and occasionally even throws exceptions).

To Reproduce
Run:

public void DataFrameDecodeBooleans()
{
    BooleanArray boolArray = new BooleanArray.Builder().AppendNull().Append(false).Append(true).Build();
    Field boolField = new Field("col", BooleanType.Default, true);
    Schema schema = new Schema(new[] { boolField }, null);
    RecordBatch batch = new RecordBatch(schema, new IArrowArray[] { boolArray }, boolArray.Length);
    DataFrame df = DataFrame.FromArrowRecordBatch(batch);
    DataFrameColumn boolCol = df["col"];
    Assert.Equal(boolArray.Length, boolCol.Length);
    for (int row = 0; row < boolArray.Length; row++)
    {
        Assert.Equal(boolArray.GetValue(row), boolCol[row]);
    }
}

Expected behavior
Above test passes

Metadata

Metadata

Assignees

No one assigned

    Labels

    untriagedNew issue has not been triaged

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions