🚧 Colour based tagging & non-ocr page_to_text#111
Open
seanmcguire12 wants to merge 48 commits intomainfrom
Open
🚧 Colour based tagging & non-ocr page_to_text#111seanmcguire12 wants to merge 48 commits intomainfrom
seanmcguire12 wants to merge 48 commits intomainfrom
Conversation
…r to tagging. Added functionality for combining the OCR annotations of the original/raw page and the tagged page.
…to be compatible with supertype ITarsier.
…ase tagging from page_to_image() to page_to_text(), added method combine_annotations() to decouple the sorting logic from page_to_text().
✨ Added functionality for taking screenshot of original/raw page prior to tagging. Added functionality for combining the OCR annotations of the original/raw page and the tagged page.
Updated APE-76 with double tagging fix.
Contributor
|
h4q2uwr0z0sVFM0q5AV7n
|
Contributor
|
1JWoJWs3uZMt8Wa5ql6pr
|
Contributor
Contributor
Contributor
Contributor
Contributor
|
aLnmhAeCwsHCd3dM53rwG |
Contributor
…n so that it doesn't return elements with bounding boxes that occupy no space (i.e., elements that do not actually appear on the page).
…formatting issues. Fixed issue where elements with tagged children were getting rendered twice & creating collisions.
…the next screenshot. - Added getNextColors function to generate a diverse list of RGB colors based on the number of elements. - Added colorDistance function to calculate the Euclidean distance between two colors. - Added assignColors function to assign colors to elements, ensuring maximum color distance. - Improvements to colourBasedTagify: - Tag and collect elements with bounding boxes > 0 and within the viewport. - Apply colors to element borders & set opacity to 1 - Handle special cases for links - Set visibility of non-tagged child elements to hidden. - Added function for disabling/enabling transitions and animation to be used before taking screenshots
- Added boundingBoxX and boundingBoxY attributes in ColouredElem to track element positions - Updated `page_to_text` to handle recoloring of undetected elements: - Added functions to disable and enable transitions/animations during screenshots. - Added recoloring logic to improve element detection. - Ensured missing elements are recolored and re-checked for visibility. - Added new _check_colours_ method to compare colors within a threshold to be used on the first pass.
# Conflicts: # poetry.lock # tarsier/core.py # tarsier/tag_utils.ts
# Conflicts: # tarsier-snapshots/snapshots/05W3ZEmj8pbuYSHArYUkz/ocr.txt # tarsier-snapshots/snapshots/07wOwFaw3aGekjCBpZkg0/ocr.txt # tarsier-snapshots/snapshots/0fdyKSMbc3kVUgL9RGiEk/ocr.txt # tarsier-snapshots/snapshots/0orFfEesEVpe1BN7B114a/ocr.txt # tarsier-snapshots/snapshots/11u8vZX9JHQsOrXVSWfJd/ocr.txt # tarsier-snapshots/snapshots/1JWoJWs3uZMt8Wa5ql6pr/ocr.txt # tarsier-snapshots/snapshots/1N0FTiHE53vO1j0nHDNG5/ocr.txt # tarsier-snapshots/snapshots/1qkOHewUy0Kqq9RVVSOoQ/ocr.txt # tarsier-snapshots/snapshots/1z50b4syZzf7J1kQt2k7W/ocr.txt # tarsier-snapshots/snapshots/24SLE3KnDhtOYYgIM4ote/ocr.txt # tarsier-snapshots/snapshots/2ErcEyBkupKnoHkQAhJCk/ocr.txt # tarsier-snapshots/snapshots/2HauD8zfTdDq75G7WRJzB/ocr.txt # tarsier-snapshots/snapshots/2tmwuvYVJ9KqVgHIOgctI/ocr.txt # tarsier-snapshots/snapshots/3MAIydQKH2qHnl1cuLmNc/ocr.txt # tarsier-snapshots/snapshots/3vDNIHFXtjcnarvJQWdd7/ocr.txt # tarsier-snapshots/snapshots/47D2wwbE0WZOV6obQbYA7/ocr.txt # tarsier-snapshots/snapshots/484mHWaGAH0l8tgW95Hvv/ocr.txt # tarsier-snapshots/snapshots/4Hmgj9cuidpeiVpWdXVBf/ocr.txt # tarsier-snapshots/snapshots/4Je6qSd4YFoyLxVZLQRb7/ocr.txt # tarsier-snapshots/snapshots/4KGjHFZbEpB345rOxuIzv/ocr.txt # tarsier-snapshots/snapshots/Ey2q7uEroarG84e6YZnym/ocr.txt # tarsier-snapshots/snapshots/F5AaImEw3SHGkneXd36eH/ocr.txt # tarsier-snapshots/snapshots/FIECXMTasC96yBFr7BcN1/ocr.txt # tarsier-snapshots/snapshots/FSfE85pVbn96ntVl1qEGp/ocr.txt # tarsier-snapshots/snapshots/FeRDLVQyg3Y1l62axB6az/ocr.txt # tarsier-snapshots/snapshots/FmpHbDna6mnBNe0hLyTYZ/ocr.txt # tarsier-snapshots/snapshots/Fw6hoBmn7nm2KAy4YDzv9/ocr.txt # tarsier-snapshots/snapshots/G9Xy74ZxrdukPaChWTWAo/ocr.txt # tarsier-snapshots/snapshots/GAeRa1QK7BcoGKelpEOA9/ocr.txt # tarsier-snapshots/snapshots/GNekmizdgssA6t94zWOId/ocr.txt # tarsier-snapshots/snapshots/GQfYTjppPhTgYtsuFUbXF/ocr.txt # tarsier-snapshots/snapshots/GcW0Q862yCbKr28CQTg2c/ocr.txt # tarsier-snapshots/snapshots/GuFznteaPUy3yrETrOh4Y/ocr.txt # tarsier-snapshots/snapshots/HCrPjvyx0XaLvNHxVBPZt/ocr.txt # tarsier-snapshots/snapshots/HixdQTqLbSa6zIaKmxxE1/ocr.txt # tarsier-snapshots/snapshots/HleEA9DcP1jBVN5cBEmFT/ocr.txt # tarsier-snapshots/snapshots/Hramb0PgtU7wHEj0D5OKj/ocr.txt # tarsier-snapshots/snapshots/I8Bj6okah8nrfPEinahWH/ocr.txt # tarsier-snapshots/snapshots/IUnyfHVheJUrv8frQYQib/ocr.txt # tarsier-snapshots/snapshots/JAiFFb1qWlEVk48Ny32ND/ocr.txt # tarsier-snapshots/snapshots/JNOSAEEZO4j2unWHPFBdO/ocr.txt # tarsier-snapshots/snapshots/JaVENaBu8Iu7yYoUNrORW/ocr.txt # tarsier-snapshots/snapshots/JwUW9qdzk0NgtnK2Y2BSS/ocr.txt # tarsier-snapshots/snapshots/Jxv57Kbqw1AP4qv1zvlqg/ocr.txt # tarsier-snapshots/snapshots/K88O7OW0FJoCfdVUD4xXH/ocr.txt # tarsier-snapshots/snapshots/KGEFdtgwltNKXKHGOkkaF/ocr.txt # tarsier-snapshots/snapshots/KNyomEvINtDSbA7cKRr1F/ocr.txt # tarsier-snapshots/snapshots/KTcoPSidqLGESp29nQtLv/ocr.txt # tarsier-snapshots/snapshots/KuDD2GuMDlbuKO4ozdbDA/ocr.txt # tarsier-snapshots/snapshots/KypCMQmDQ2XZ2GIbMKacI/ocr.txt # tarsier-snapshots/snapshots/L3uXGoAVL6YpRHGBCnlB8/ocr.txt # tarsier-snapshots/snapshots/L6BOPpJEJhhN5JHfWj4g1/ocr.txt # tarsier-snapshots/snapshots/LN4K9AZwaPC50Z4e513su/ocr.txt # tarsier-snapshots/snapshots/LNMVWWtQRcjkj54ONLebI/ocr.txt # tarsier-snapshots/snapshots/LOnORRBp7zDQifntNAcFO/ocr.txt # tarsier-snapshots/snapshots/LuM2bHYg5mnBvjhttTDlh/ocr.txt # tarsier-snapshots/snapshots/Ly1DY8GL7cV5mWnxr1DH5/ocr.txt # tarsier-snapshots/snapshots/MIZDQx8G6Gn562lO5hFQb/ocr.txt # tarsier-snapshots/snapshots/MP4p6ibb3PLD3i8AmBrZ3/ocr.txt # tarsier-snapshots/snapshots/MQOPSYI3SU7EEQRbHUMHr/ocr.txt # tarsier-snapshots/snapshots/MQrMR8W7oJtlUc056qZ6L/ocr.txt # tarsier-snapshots/snapshots/MRD347sMiS2vlw091LAqK/ocr.txt # tarsier-snapshots/snapshots/NHWkSmdwXKQb9oe9vVGZf/ocr.txt # tarsier-snapshots/snapshots/NJouZuI4JTRsMz3KYK1cV/ocr.txt # tarsier-snapshots/snapshots/NLtUSUexaGqmRUBomWj9R/ocr.txt # tarsier-snapshots/snapshots/NSVMR9p35Pku7LUyPCMHY/ocr.txt # tarsier-snapshots/snapshots/NUkrUYwOJuYfv5SC3GHTE/ocr.txt # tarsier-snapshots/snapshots/NV6JL1wEHaTPuK65dKt6t/ocr.txt # tarsier-snapshots/snapshots/NZoqFzLNm1OJsS96Pyxbi/ocr.txt # tarsier-snapshots/snapshots/O3kSfBi6P0CQBJTCmjV7B/ocr.txt # tarsier-snapshots/snapshots/O3t7Of3CTP2WUj71YddFO/ocr.txt # tarsier-snapshots/snapshots/OWLWiq0ePIJmx5VmtquOD/ocr.txt # tarsier-snapshots/snapshots/Ofe0weKbJ9yl5vEwkalCS/ocr.txt # tarsier-snapshots/snapshots/OlYrsJi04Czdu7Uvl1mIF/ocr.txt # tarsier-snapshots/snapshots/OmJeRJARVmguS9uMWU1Xb/ocr.txt # tarsier-snapshots/snapshots/P7dY0WRzR4PCfWZNuSeBf/ocr.txt # tarsier-snapshots/snapshots/PiQlpch5uQzWNXiEEvjX3/ocr.txt # tarsier-snapshots/snapshots/PthtpZsDczvCOCFYIogKI/ocr.txt # tarsier-snapshots/snapshots/PzN7n57ArAxzcJHzx63NY/ocr.txt # tarsier-snapshots/snapshots/QIOkg628A7yzKluLVB8of/ocr.txt # tarsier-snapshots/snapshots/QJ1O4XyX7e3CpAPQ3Bonw/ocr.txt # tarsier-snapshots/snapshots/QOZFTfvesXGZxgsmHqnrL/ocr.txt # tarsier-snapshots/snapshots/QWwSzGV7QMgJprOY5cOpP/ocr.txt # tarsier-snapshots/snapshots/Ql2B37FdNugeJ09WjopGa/ocr.txt # tarsier-snapshots/snapshots/QlAWMyjvSxPHh4E5Fkjfs/ocr.txt # tarsier-snapshots/snapshots/QuUpyX6Z5U2HUUQWJV3S4/ocr.txt # tarsier-snapshots/snapshots/QwiRD9fjb4YuRaY3Ypz3f/ocr.txt # tarsier-snapshots/snapshots/QxSSau0T34NCk6O1bq4Cd/ocr.txt # tarsier-snapshots/snapshots/R99SMT2jvCjJRqRGra2g6/ocr.txt # tarsier-snapshots/snapshots/RIqXLn8bSaFN0AG4DdoHO/ocr.txt # tarsier-snapshots/snapshots/RVotqLcMUyKXULUTqYCvm/ocr.txt # tarsier-snapshots/snapshots/RpjyEXqtmEQDFWgojBJMU/ocr.txt # tarsier-snapshots/snapshots/S8AKJlRl5F8Vci1UiLU1a/ocr.txt # tarsier-snapshots/snapshots/SEyENcYHqerkt0nmJZjl7/ocr.txt # tarsier-snapshots/snapshots/STPTr6OhlruneOtA24xi9/ocr.txt # tarsier-snapshots/snapshots/SjzTipa4JUYx4Ocn5VkCV/ocr.txt # tarsier-snapshots/snapshots/SlMfqkoK2KeAp31dHr88F/ocr.txt # tarsier-snapshots/snapshots/Sqb7SeHvAcouDW5rFl9yu/ocr.txt # tarsier-snapshots/snapshots/Std6TTbgilRTiLDGJOezx/ocr.txt # tarsier-snapshots/snapshots/T1pTeE6hYcFsaZ84no4GM/ocr.txt # tarsier-snapshots/snapshots/TG8dn0Xi3SJC0VHjWRH1P/ocr.txt # tarsier-snapshots/snapshots/TKUFwwdmB0ioMyUXvozpu/ocr.txt # tarsier-snapshots/snapshots/TLxVvFZ6MRB0nbSBWl8ym/ocr.txt # tarsier-snapshots/snapshots/TQyvtLuRcbSStSHq1seCq/ocr.txt # tarsier-snapshots/snapshots/U5wOXA13nV6xyogmib6uL/ocr.txt # tarsier-snapshots/snapshots/UEQ5bJeIeTst0YVL8ga9Z/ocr.txt # tarsier-snapshots/snapshots/UPCNbyQNGulQpM6v6sxUo/ocr.txt # tarsier-snapshots/snapshots/UjsF3B4ihFcZjXEcZCnm1/ocr.txt # tarsier-snapshots/snapshots/VPIrl5m9IfNLKS03UyzNH/ocr.txt # tarsier-snapshots/snapshots/Vba6zNQZmxgxA8byjpmaA/ocr.txt # tarsier-snapshots/snapshots/Vo8MreF9aVq5bE45XqaMz/ocr.txt # tarsier-snapshots/snapshots/VogIUZw1FJlCEiBzTUwYR/ocr.txt # tarsier-snapshots/snapshots/VqSaCh7ffPXKh1IymN8Oo/ocr.txt # tarsier-snapshots/snapshots/W8QTUDItaXJSOaBOZGAE8/ocr.txt # tarsier-snapshots/snapshots/WDGGGgqdb1RGaoGlseBJk/ocr.txt # tarsier-snapshots/snapshots/WEVQJfQEWky3KR7Hc2kuK/ocr.txt # tarsier-snapshots/snapshots/WyQg7esKNNds3EYMZCx2J/ocr.txt # tarsier-snapshots/snapshots/XSzc3ewTsGRYwwdHvb6LK/ocr.txt # tarsier-snapshots/snapshots/Xixe0WiedsLB1KFcKpv2r/ocr.txt # tarsier-snapshots/snapshots/Xnuxii49OIfjWntcihbjX/ocr.txt # tarsier-snapshots/snapshots/XsNkGYeq1DTAnyKuuvHPZ/ocr.txt # tarsier-snapshots/snapshots/Xu7Q49cgzMsp4cgMR0qqS/ocr.txt # tarsier-snapshots/snapshots/XxXTjDH2qRuu4n5BSLM5d/ocr.txt # tarsier-snapshots/snapshots/Yb4ug21SFYfiN4ENjJCcz/ocr.txt # tarsier-snapshots/snapshots/YuBInhOP8OdQAfy4Htvre/ocr.txt # tarsier-snapshots/snapshots/ZW0ihimOJEReeseRBrI5i/ocr.txt # tarsier-snapshots/snapshots/ZYBqV9WrmYmyFExthpKLD/ocr.txt # tarsier-snapshots/snapshots/a0pJxHhxIHFKcoFjkORnG/ocr.txt # tarsier-snapshots/snapshots/aLnmhAeCwsHCd3dM53rwG/ocr.txt # tarsier-snapshots/snapshots/aQZGYIDkaa6JY6aXv6wXQ/ocr.txt # tarsier-snapshots/snapshots/aa3t8r3kAlp9FYx2uSOFz/ocr.txt # tarsier-snapshots/snapshots/abgIXICPIttq3MhkmSVdV/ocr.txt # tarsier-snapshots/snapshots/ahEBAfuWtiZ8HM77W2d2D/ocr.txt # tarsier-snapshots/snapshots/aivDVkwH92hQdu5cDr4nv/ocr.txt # tarsier-snapshots/snapshots/apscD5vWHBV1dvAX6K7Vt/ocr.txt # tarsier-snapshots/snapshots/awL4PUmAj9TIIqR6L95fq/ocr.txt # tarsier-snapshots/snapshots/bOVNaNsrc6UrCdlhHLxGy/ocr.txt # tarsier-snapshots/snapshots/bOlARasPXtWAjEDfxtk2L/ocr.txt # tarsier-snapshots/snapshots/bZPREHVg723XRC2I6z9MQ/ocr.txt # tarsier-snapshots/snapshots/bwwko5J7aFk5K8qz61jBI/ocr.txt # tarsier-snapshots/snapshots/c3s1dYwKWMEJHKGyP3qnr/ocr.txt # tarsier-snapshots/snapshots/cAeniCN923UcmnXuOOIBJ/ocr.txt # tarsier-snapshots/snapshots/cFcnDQSGQgDeyHBnZtrU8/ocr.txt # tarsier-snapshots/snapshots/cMPCNSczVAPhdXJxBIBEd/ocr.txt # tarsier-snapshots/snapshots/cdFPVICHIa5evhnj1OiMx/ocr.txt # tarsier-snapshots/snapshots/cohMcyz81B0NHA04Qeik2/ocr.txt # tarsier-snapshots/snapshots/ct6PuXzujbOlM9zaARUpa/ocr.txt # tarsier-snapshots/snapshots/cv3sq0A9o3VHmD1UvEWse/ocr.txt # tarsier-snapshots/snapshots/e7iDpCvvfiq3oU1UAvxTC/ocr.txt # tarsier-snapshots/snapshots/eE46U0AMRoczeDL2eOcgf/ocr.txt # tarsier-snapshots/snapshots/eKKvQ3OZG6H0jjTIRINPs/ocr.txt # tarsier-snapshots/snapshots/eSG6HgfI2R9JpZQRozsSV/ocr.txt # tarsier-snapshots/snapshots/ecqQm32DLMtTUWt2AQxhm/ocr.txt # tarsier-snapshots/snapshots/f41Dz5iiwe5QjVbXqWpJJ/ocr.txt # tarsier-snapshots/snapshots/fJPQwUD42zT2WKhdBJLnN/ocr.txt # tarsier-snapshots/snapshots/fJWonTvHgvl7Ex9DdB1Px/ocr.txt # tarsier-snapshots/snapshots/gHXZyrqL7qpmKMFYM6oGE/ocr.txt # tarsier-snapshots/snapshots/gKfAQGripVAFa87dehr5m/ocr.txt # tarsier-snapshots/snapshots/gd2iNA5INcT66penKY175/ocr.txt # tarsier-snapshots/snapshots/gdtUqXUos3CdM6zVlMbbC/ocr.txt # tarsier-snapshots/snapshots/gg5AAaFekWGXPdKtYBoer/ocr.txt # tarsier-snapshots/snapshots/ggdDF9CwmrmiBHsQvZcDk/ocr.txt # tarsier-snapshots/snapshots/h4q2uwr0z0sVFM0q5AV7n/ocr.txt # tarsier-snapshots/snapshots/ijJbuKPqEOkA4OK0BzLPk/ocr.txt # tarsier-snapshots/snapshots/jCYLQBT1114BBW83zKQdt/ocr.txt # tarsier-snapshots/snapshots/jH56yUizuVbTYWAIwSJkM/ocr.txt # tarsier-snapshots/snapshots/k1I07SwT7Clry1xxPODfa/ocr.txt # tarsier-snapshots/snapshots/kZVEvHT3kuBfZtNUY8rC2/ocr.txt # tarsier-snapshots/snapshots/kbd8qO9tx1Efbf08MqZWQ/ocr.txt # tarsier-snapshots/snapshots/ke6newcCWvPhsxeZ5TCZ4/ocr.txt # tarsier-snapshots/snapshots/kfueRbnkKCdJwC0BRiggp/ocr.txt # tarsier-snapshots/snapshots/kvcH8Q2BG1SPgWSAN3f2h/ocr.txt # tarsier-snapshots/snapshots/kx3CBXYC9YUyRIFIMYTcD/ocr.txt # tarsier-snapshots/snapshots/l3mMTs6gZa1GvpGjknIFT/ocr.txt # tarsier-snapshots/snapshots/l8QvEOlveFkWUVYu1HNgD/ocr.txt # tarsier-snapshots/snapshots/lBTRjkiZqEdNvCSjTmoWG/ocr.txt # tarsier-snapshots/snapshots/lHjLewJTfQKFSAmGE5Wr1/ocr.txt # tarsier-snapshots/snapshots/lSwsaU5jAVRddpYTCsWEd/ocr.txt # tarsier-snapshots/snapshots/n1VHZA0AkvnKB3Qy2hqvB/ocr.txt # tarsier-snapshots/snapshots/n1zh09obI7c51LUTBNNBE/ocr.txt # tarsier-snapshots/snapshots/n28tTMFEZfIyMXsCxO6Ra/ocr.txt # tarsier-snapshots/snapshots/n7LTn5tVJ2B3IvDopFTFO/ocr.txt # tarsier-snapshots/snapshots/nAXVoJDSuul938vtPvfFB/ocr.txt # tarsier-snapshots/snapshots/nXWHr3UoycfzFqubWTUpn/ocr.txt # tarsier-snapshots/snapshots/njhgFq4h4BcMTdaRxtElY/ocr.txt # tarsier-snapshots/snapshots/nxkcxrThdmaRX01YRXtho/ocr.txt # tarsier-snapshots/snapshots/o28cv918RSdVcg2P55tGq/ocr.txt # tarsier-snapshots/snapshots/oBJMkbpRqNM02wNlOTP3N/ocr.txt # tarsier-snapshots/snapshots/oEAjw9fv6UXmS63CIzZlU/ocr.txt # tarsier-snapshots/snapshots/oaDAf9SeUsVwpDeKajNrs/ocr.txt # tarsier-snapshots/snapshots/ogRf0dLwJKiDJUQnzz4pn/ocr.txt # tarsier-snapshots/snapshots/pAObMNn95uFVSll7pCXpg/ocr.txt # tarsier-snapshots/snapshots/pNsTF6muOdSesbhNTFI9g/ocr.txt # tarsier-snapshots/snapshots/pXL6ojrOhW79o92e8IXw0/ocr.txt # tarsier-snapshots/snapshots/pk7eEZ2sweN4YzzFVK217/ocr.txt # tarsier-snapshots/snapshots/prf1dSczRpaoWLrEMseB1/ocr.txt # tarsier-snapshots/snapshots/q3jMY8P01UJCw3ggDs1OJ/ocr.txt # tarsier-snapshots/snapshots/q72iVxzE9cGatHU1cLKJX/ocr.txt # tarsier-snapshots/snapshots/qgEjcl77WINh8ltNc9NoC/ocr.txt # tarsier-snapshots/snapshots/qrWALKWSykHxTLuVy0Rl7/ocr.txt # tarsier-snapshots/snapshots/qtRibcsG6iq09TyGQoYhv/ocr.txt # tarsier-snapshots/snapshots/qyZjOcbaiHuVq4FpOB26b/ocr.txt # tarsier-snapshots/snapshots/rFp4CQs5ZxAebcIM0d62U/ocr.txt # tarsier-snapshots/snapshots/rGFdlkuftF7L1VlFL7LbS/ocr.txt # tarsier-snapshots/snapshots/rKCkTGVbx4Mpi0BAnKCRd/ocr.txt # tarsier-snapshots/snapshots/rZQpVHDs30D7WbTFIiXCr/ocr.txt # tarsier-snapshots/snapshots/ranUaEMdxbjMltYPt2AX7/ocr.txt # tarsier-snapshots/snapshots/rgCTp6HulNEsEqEupEUZN/ocr.txt # tarsier-snapshots/snapshots/rmMxc6dEoyE1WpLLWqTHV/ocr.txt # tarsier-snapshots/snapshots/t8biLN0RgFBPYO2hv2JYJ/ocr.txt # tarsier-snapshots/snapshots/tIowzAEvZcWH9ukP4Aofa/ocr.txt # tarsier-snapshots/snapshots/tV4VsHCiYAA3o6oKYyXVk/ocr.txt # tarsier-snapshots/snapshots/tVBOUnrTSDIHQbsMw2WgS/ocr.txt # tarsier-snapshots/snapshots/tbRxihP0jtq5O12zVhvEF/ocr.txt # tarsier-snapshots/snapshots/token_statistics.txt # tarsier-snapshots/snapshots/u2IEvb9Ke4lKLaD4LtJYE/ocr.txt # tarsier-snapshots/snapshots/u3fjwZRjKUEcvr8kkmy5v/ocr.txt # tarsier-snapshots/snapshots/u7I1P6OC5xX8f3u8Fwjvf/ocr.txt # tarsier-snapshots/snapshots/uOmbtFqUSqItS8CKmyi51/ocr.txt # tarsier-snapshots/snapshots/uPrnCohCwLCrVvwN8eXWZ/ocr.txt # tarsier-snapshots/snapshots/uibGV6FB4gcYvY93AIWJe/ocr.txt # tarsier-snapshots/snapshots/v7hgryy94evdLb0aHzDtY/ocr.txt # tarsier-snapshots/snapshots/vELUj6wGf96coJAqt0x5D/ocr.txt # tarsier-snapshots/snapshots/vVJc0PFcYOzKHHL1v1hev/ocr.txt # tarsier-snapshots/snapshots/vgTQTZN0Efl4vXQ0I9Iy8/ocr.txt # tarsier-snapshots/snapshots/wUHnayH90bjRjjjdCT0r2/ocr.txt # tarsier-snapshots/snapshots/wXhQ0YobLZ4z1BAZesBUF/ocr.txt # tarsier-snapshots/snapshots/wjmMahVNX7T1jH9GmVW9r/ocr.txt # tarsier-snapshots/snapshots/wqGtmRYz4PWe4LCxAW4UI/ocr.txt # tarsier-snapshots/snapshots/x9tCDlr2WOazDKVrF3njD/ocr.txt # tarsier-snapshots/snapshots/xCHAOXtOYz47HfNY9LeZq/ocr.txt # tarsier-snapshots/snapshots/xZCsA0eNaR7OMmhcBlsOv/ocr.txt # tarsier-snapshots/snapshots/xgnNjPdOMUY0LZ1GJdEsE/ocr.txt # tarsier-snapshots/snapshots/xh7zxFmYI3du3PWBnEjQ4/ocr.txt # tarsier-snapshots/snapshots/xkEtVvkl3HDnC827Flk3g/ocr.txt # tarsier-snapshots/snapshots/xkINPY1INO91Jv5ZokNGu/ocr.txt # tarsier-snapshots/snapshots/yXLMF4nocYqJnql2dt71R/ocr.txt # tarsier-snapshots/snapshots/yoqTH08pW464eBIPYwd5r/ocr.txt # tarsier-snapshots/snapshots/yzwuXotaBr52CyG4mUDhy/ocr.txt # tarsier-snapshots/snapshots/zKVOGYYHXR3uskE0WcG1A/ocr.txt # tarsier-snapshots/snapshots/zPfbTSTbZ3sOGYDiqwyj0/ocr.txt # tarsier-snapshots/snapshots/zRdqy27hn5RdNqJqnjzaA/ocr.txt # tarsier-snapshots/tarsier_snapshots/snapshots.py
pyproject.toml
Outdated
| [tool.poetry] | ||
| name = "tarsier" | ||
| version = "0.6.3" | ||
| version = "0.6.39" |
Comment on lines
+9
to
+12
|
|
||
| cd ./tarsier-snapshots || exit 1 | ||
| poetry install | ||
| poetry run bananalyze --download |
Contributor
There was a problem hiding this comment.
not really relevant for settuing up tarsier. Would delete
# Conflicts: # poetry.lock # tarsier-snapshots/tarsier_snapshots/snapshots.py
…eturn element bounding box
…not just unique ones
…eturn value to match that of original page_to_text
# Conflicts: # poetry.lock # tarsier/core.py # tarsier/tag_utils.ts
…, colour images, capture image alt text
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.





WIP: implemented colour based tagging & page_to_text_new which doesnt use OCR.