Skip to content

Python decodes EUC-JP 8FA2A7 as TILDE instead of FULLWIDTH TILDE #113274

@qsantos

Description

@qsantos

Bug report

Bug description:

Python decodes the bytes 0x8FA2A7 as ~ (TILDE) in EUC-JP.

assert b'\x8f\xa2\xb7'.decode('euc_jp') == '~'

This reference document is ambiguous in that it shows a simple ~ (TILDE), but most other software (iconv, Vim, Firefox, Rust's encoding_rs) interpret this as ~ (FULLWIDTH TILDE). Note that EUC-JP already includes US-ASCII, and so:

assert '~'.encode('euc-jp') == b'~'

CPython versions tested on:

3.11, CPython main branch

Operating systems tested on:

Linux

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions