"extract_text" doesn't output the same transformation matrix in version 3.17 as in 3.16.

I'm trying to extract text from a pdf together with the position of the text.
When I do it in pypdf 3.16 I get the expected result, but I don't in 3.17.

## Environment

Windows-10-10.0.19045-SP0
pypdf==3.16.0, crypt_provider=('cryptography', '41.0.3'), PIL=9.5.0
AND
pypdf==3.17.3, crypt_provider=('cryptography', '41.0.7'), PIL=9.5.0

## Code + PDF

This is a minimal, complete example that shows the issue:

```python
import pypdf
file_path = "list.pdf"
reader = pypdf.PdfReader(file_path)

text_parts = []

def visitor(text, cm, tm, fd, fs):
    if text.strip() == "Flyttesagsnr.:":
        text_parts.append((cm, tm, text))

reader.pages[0].extract_text(visitor_text=visitor)

print(text_parts)
```

Unfourtunately I can't share the PDF since it's confidential. I haven't been able to declassify the document and keep the bug.
I know this might make the bug hard to replicate.

## Results

In version 3.17 I get:
```python
[([0.75, 0.0, 0.0, -0.75, 0.0, 841.68], [1.0, 0.0, 0.0, 1.0, 0.0, 0.0], ' Flyttesagsnr.:')]
```

In version 3.16 I get:
```python
[([0.75, 0.0, 0.0, -0.75, 0.0, 841.68], [1.0, 0.0, 0.0, -1.0, 448.313, 352.05], ' Flyttesagsnr.:')]
```

As you can see tm[4] and tm[5] are both 0 in version 3.17, which is definitely wrong.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"extract_text" doesn't output the same transformation matrix in version 3.17 as in 3.16. #2353

Environment

Code + PDF

Results

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

"extract_text" doesn't output the same transformation matrix in version 3.17 as in 3.16. #2353

Description

Environment

Code + PDF

Results

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions