Skip to content

Is there a way to delete headers/footers in PDF documents? #2257

@sergenti

Description

@sergenti

🤔 Is your feature request related to a problem? Please describe.
Most AI models are not trained on PDF data since parsing it is difficult. I'm working on a PDF parsing project that removes tables, charts headers, etc., so extraction libraries like PyMuPDF can improve significantly.
I solved table removal; I would love to solve header removal now.

💡 Describe the solution you'd like
Can we remove headers/footers on PDFs so the output of page.get_text() is cleaner?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions