- Reading editable as well as scanned PDF documents for text and tables using python library - pdfplumber
- Extracting the images in them using python library - fitz (PyMuPDF).
A python imaging library PIL is used to view the images extracted from PDF files: https://pillow.readthedocs.io/en/stable/
Tabula is a tool to extract tables from PDFs: https://github.com/tabulapdf/tabula