To use this program, 4 folders will need to be placed in the same folder as the programs: "ocr_output", "metadata", "images" and "input". Put whatever PDF that is to be processed into the input folder and run the .bat file. This file will scan a PDF and if it is needed run OCR on it, if it already has digital fonts, it will skip the OCR process. After running the process of OCR, the program will move onto extracting images. However, this program will only work with raster images, vector images will NOT work. After images are extracted, metadata JSON figures will be generated with the amount of figures being made depending on the amount of images extracted. The outputs will be put into their appropriate output folder.
-
Notifications
You must be signed in to change notification settings - Fork 0
Snoimbus/PDF-Processing
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
This program was made to help me to streamline the processing of PDFs at my internship.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published