Skip to content

frequent unknown instance ID error when running multiple backend instances #69

@famarting

Description

@famarting

We have found an scenario where there is a transient error that happens frequently

failed to complete orchestration task: rpc error: code = Unknown desc = unknown instance ID: 5f7b2345-897d-4471-af96-6c8e590a29bf

The unknown instance ID could be considered transient, because after the server returning this error to the client, the server stops giving that error after retries, but IMO it shows a more fundamental problem with the server side implementation.

In our scenario we can run multiple instances of the server, so there are multiple grpc servers behind a load balancer. So it can happen that a request to CompleteOrchestratorTask lands in a server where there is no "pending orchestrator" to serve that request.

Here is the series of steps I went through to come to that conclusion:

  • first you schedule a new orchestration
  • on the server side the orchestration worker is eventually triggered which on ProcessWorkItem calls ExecuteOrchestrator
  • we continue on the server side and ExecuteOrchestrator , here https://github.com/microsoft/durabletask-go/blob/main/backend/executor.go#L100 , adds the instance id to be executed into a pendingOrchestrators map , then puts a work item into the work queue and then the function waits for the execution to complete by expecting a signal in a channel attached to the original instance stored in the pendingOrchestrators map
  • now on the client side, because of the work item added to the work queue, the client eventually receives the work item to execute the orchestrator and then it calls CompleteOrchestratorTask https://github.com/microsoft/durabletask-go/blob/main/backend/executor.go#L230
  • on the server side, if the call to CompleteOrchestratorTask is received by a different server instance than the one that originally put the instance id into the pendingOrchestrators map, then the unknown instance ID error will happen.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions