Don't fail markAsComplete if there is already a failure relationship #282

philmcmahon · 2025-09-04T16:40:32Z

What does this change?

Giant is getting into an infinite loop when it tries to markAsComplete a blob in this situation:

We have hundreds of errors like this in the logs:

Failed to mark 'URI' processed by 'OcrMyPdfImageExtractor' as complete: Unexpected number of creates/deletes in markAsComplete. Created: 0. Deleted: 1

In this scenario, when markAsComplete returns a Left then the whole transaction gets rolled back, the TODO remains, and giant tries to run the extractor again. sad times.

This PR gets giant to just WARN instead of rolling back the whole transaction.

Leaving this in draft as some team discussion and further digging to understand how we got into this state required I think.

How I think this happened

When we ingest a file, the InsertBlob operation in the neo4j manifest gets called. This will create or update the blob for the file. If that file already exists, it checks for existing PROCESSED relations. If there is a PROCESSED relation with the same workspace properties and ingest then no new TODO is created, otherwise it creates a new TODO

giant/backend/app/services/manifest/Neo4jManifest.scala

Lines 353 to 370 in be75f73

    
                   |  MATCH (unprocessedBlob: Blob:Resource {uri: {blobUri}}) 
        
                   |    WHERE 
        
                   |      NOT (unprocessedBlob)<-[:PROCESSED { 
        
                   |        ingestion: {ingestion}, 
        
                   |        languages: {languages}, 
        
                   |        parentBlobs: {parentBlobs} 
        
                   |        ${maybeWorkspaceProperties} 
        
                   |      } ]-(extractor) 
        
                   | 
        
                   |  MERGE (unprocessedBlob)<-[todo:TODO { 
        
                   |    ingestion: {ingestion}, 
        
                   |    languages: {languages}, 
        
                   |    parentBlobs: {parentBlobs} 
        
                   |    ${maybeWorkspaceProperties} 
        
                   |  }]-(extractor) 
        
                   |    ON CREATE SET todo.cost = cost, 
        
                   |                  todo.priority = priority, 
        
                   |                  todo.attempts = 0

So, if you upload a hundred files via the workspace UI, you'll end up with a hundred new blobs with TODO relations to the relevant extractors. If you then kick off a cli ingest (which invariably will have a different ingestion uri, and obviously no workspace properties), then those blobs will all get a second TODO relation to the relevant extractors.

One of the TODOs will get picked up first - let's call it todo1. When it completes, this function gets called. As I understand it, it will delete todo1, and add a PROCESSED relation tagged with various things.

Then, the next - todo2 will get picked up and completed. When it hits the above function, it will find todo2, delete it, but then rather than creating a PROCESSED relation it will just change the properties of the existing PROCESSED relation. Then the check if(counters.relationshipsCreated() != 1 || counters.relationshipsDeleted() != 1) gets run, fails and the whole transaction gets rolled back. This leaves a hanging TODO, which will be picked up again in future by fetchWork - so long as the attempts property of the relation is not above max attempts. fetchWork is responsible for updating .attempts, but in the example I looked at, attempts was still at 0 on the TODO.

TODO - more research to understand this

How to test

Don't fail markAsComplete if there is already a failure relationship

37b196d

hoyla added internal processors external processors labels Sep 25, 2025

srbd force-pushed the main branch from d25dfeb to b187bbb Compare October 2, 2025 13:38

srbd added the fix Departmental tracking: fix label Oct 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't fail markAsComplete if there is already a failure relationship #282

Don't fail markAsComplete if there is already a failure relationship #282

Uh oh!

philmcmahon commented Sep 4, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	\| MATCH (unprocessedBlob: Blob:Resource {uri: {blobUri}})
	\| WHERE
	\| NOT (unprocessedBlob)<-[:PROCESSED {
	\| ingestion: {ingestion},
	\| languages: {languages},
	\| parentBlobs: {parentBlobs}
	\| ${maybeWorkspaceProperties}
	\| } ]-(extractor)
	\|
	\| MERGE (unprocessedBlob)<-[todo:TODO {
	\| ingestion: {ingestion},
	\| languages: {languages},
	\| parentBlobs: {parentBlobs}
	\| ${maybeWorkspaceProperties}
	\| }]-(extractor)
	\| ON CREATE SET todo.cost = cost,
	\| todo.priority = priority,
	\| todo.attempts = 0

Don't fail markAsComplete if there is already a failure relationship #282

Are you sure you want to change the base?

Don't fail markAsComplete if there is already a failure relationship #282

Uh oh!

Conversation

philmcmahon commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this change?

How I think this happened

How to test

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

philmcmahon commented Sep 4, 2025 •

edited

Loading