Skip to content

builtins.toXML can return strings with invalid UTF-8 encoding #12061

Open
@NaN-git

Description

@NaN-git

Describe the bug

Applying builtins.toXML to a string with invalid UTF-8 encoding returns

"<?xml version='1.0' encoding='utf-8'?>\n<expr>\n  <string value=\"[...]\" />\n</expr>\n"

where [...] is the input string with invalid UTF-8 encoding.

Steps To Reproduce

  1. Download test file:
wget -O - https://github.com/flenniken/utf8tests/raw/refs/heads/main/utf8tests.bin | head -n 208 > utf8tests.bin
  1. Evaluate the following Nix expression, e.g. in nix repl:
builtins.toXML (builtins.readFile ./utf8tests.bin)
  1. The output contains the invalid UTF-8 input string.

Expected behavior

Either builtins.readFile or builtins.toXML should fail and a proper error message should be displayed.

Additional context

Related to issue #12060.

Checklist


Add 👍 to issues you find important.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugidea approvedThe given proposal has been discussed and approved by the Nix team. An implementation is welcome.languageThe Nix expression language; parser, interpreter, primops, evaluation, etc

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions