Skip to content

Regression in tokenizer handling of \r #128233

Open
@tusharsadhwani

Description

@tusharsadhwani

Bug report

Bug description:

Python 3.12 onwards we get a weird \r} token when trying to parse a file just containing '{\r}':

$ printf '{\r}' | python3.11 -m tokenize
1,0-1,1:            OP             '{'            
1,1-1,2:            ERRORTOKEN     '\r'           
1,2-1,3:            OP             '}'            
1,3-1,4:            NEWLINE        ''             
2,0-2,0:            ENDMARKER      ''             

$ printf '{\r}' | python3.12 -m tokenize
1,0-1,1:            OP             '{'            
1,1-1,3:            OP             '\r}'          
1,3-1,4:            NEWLINE        ''             
2,0-2,0:            ENDMARKER      ''   

Weirdly, AST generation passes just fine in both cases:

$ printf '{\r}' | python3.11 -m ast
Module(
   body=[
      Expr(
         value=Dict(keys=[], values=[]))],
   type_ignores=[])

$ printf '{\r}' | python3.12 -m ast
Module(
   body=[
      Expr(
         value=Dict(keys=[], values=[]))],
   type_ignores=[])

Expected behaviour

I'd expect the \r to yield a NL instead, and we get a } OP as expected.

CPython versions tested on:

3.11, 3.12, 3.13, 3.14

Operating systems tested on:

macOS

Metadata

Metadata

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions