GitHub - Snoimbus/PDF-Processing: This program was made to help me to streamline the processing of PDFs at my internship.

To use this program, 4 folders will need to be placed in the same folder as the programs: "ocr_output", "metadata", "images" and "input". Put whatever PDF that is to be processed into the input folder and run the .bat file. This file will scan a PDF and if it is needed run OCR on it, if it already has digital fonts, it will skip the OCR process. After running the process of OCR, the program will move onto extracting images. However, this program will only work with raster images, vector images will NOT work. After images are extracted, metadata JSON figures will be generated with the amount of figures being made depending on the amount of images extracted. The outputs will be put into their appropriate output folder.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
check_input.py		check_input.py
process.py		process.py
run_process.bat		run_process.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

Snoimbus/PDF-Processing

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages