About

Reading editable as well as scanned PDF documents for text and tables using python library - pdfplumber
Extracting the images in them using python library - fitz (PyMuPDF).

A python imaging library PIL is used to view the images extracted from PDF files: https://pillow.readthedocs.io/en/stable/

Tabula is a tool to extract tables from PDFs: https://github.com/tabulapdf/tabula

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
notebooks		notebooks
README.md		README.md
generate_wc.ipynb		generate_wc.ipynb
miscll.ipynb		miscll.ipynb
requirement.txt		requirement.txt

Provide feedback