Skip to content

fitz - pdf to image conversion - some text characters are getting converted to junk #1626

@Raxidi

Description

@Raxidi

When pdf file is converted to images, text gets changed or becomes junk in some parts of the image.

To Reproduce (mandatory)

import fitz
dpi = 200
dpi_matrix = fitz.Matrix(dpi / 72, dpi / 72)
file_path = "test.pdf"
with fitz.open(file_path) as pdf_file:
for page in pdf_file:
page_pixel = page.get_pixmap(matrix=dpi_matrix)
page_pixel.set_dpi(dpi, dpi)
page_pixel.save(f"{page.number}.png")

Expected behavior (optional)

Image should be a copy of pdf page.

Screenshots (optional)

I can not upload the files here due to some constraints. I will mail you the same.

Screen shot from PDF file
pdf_img

Converted image of the same page.
converted

Your configuration (mandatory)

OS Win10/Win11 64bit
3.8.6 (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)]
win32

PyMuPDF 1.19.6: Python bindings for the MuPDF 1.19.0 library.
Version date: 2022-03-03 00:00:01.
Built for Python 3.8 on win32 (64-bit).

Thank you.

Metadata

Metadata

Assignees

Labels

upstream bugbug outside this package

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions