The goal is to support the edge case where the page's main text can not be read via the fast OCR pipeline but does contain some elements (graphs, tables, etc.) which are still readable using the fast OCR pipeline.
The approach for indexing should be configurable with an ocr_mode setting having three options:
fast => only the fast OCR pipeline
hybrid => the actual hybrid pipeline
aggressive => the new pipeline where full OCR is used if the page contains only non textual elements
full => the pipeline where the full OCR is always used for all pages