Modify the algorithm to switch to full OCR if a page contains only an image

The goal is to support the edge case where the page's main text can not be read via the fast OCR pipeline but does contain some elements (graphs, tables, etc.) which are still readable using the fast OCR pipeline. 

The approach for indexing should be configurable with an `ocr_mode` setting having three options:
- `fast` => only the fast OCR pipeline
- `hybrid` => the actual hybrid pipeline
- `aggressive` => the new pipeline where full OCR is used if the page contains only non textual elements
- `full` => the pipeline where the full OCR is always used for all pages



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify the algorithm to switch to full OCR if a page contains only an image #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Modify the algorithm to switch to full OCR if a page contains only an image #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions