Skip to content

ranja-sarkar/parse_PDF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About

  • Reading editable as well as scanned PDF documents for text and tables using python library - pdfplumber
  • Extracting the images in them using python library - fitz (PyMuPDF).

A python imaging library PIL is used to view the images extracted from PDF files: https://pillow.readthedocs.io/en/stable/

Tabula is a tool to extract tables from PDFs: https://github.com/tabulapdf/tabula

About

Parsing PDF files for text and images and storing/saving the information

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •