feat: make organic job execution flow sync by emnoor-reef · Pull Request #835 · backend-developers-ltd/ComputeHorde

emnoor-reef · 2026-01-28T12:24:14Z

No description provided.

validator/app/src/compute_horde_validator/validator/job_excuses.py

Impacts: validator

validator/app/src/compute_horde_validator/validator/job_excuses.py

mpnowacki-reef · 2026-02-01T14:33:40Z

validator/app/src/compute_horde_validator/validator/organic_jobs/miner_driver_sync.py

+        try:
+            self._run()
+        except Exception as exc:
+            sentry_sdk.capture_exception(exc)


why is the sentry_sdk called explicitly here, instead of just logging the exception, possibly with job info?

That I have copied verbatim from the async impl :v

mpnowacki-reef · 2026-02-01T14:40:29Z

.../app/src/compute_horde_validator/validator/tests/test_organic_jobs/test_miner_driver_sync.py

+    )
+
+
+@pytest.mark.django_db(databases=["default", "default_alias"], transaction=True)


I really like how concise these tests are. readable and really doing the job.

mpnowacki-reef · 2026-02-01T14:41:56Z

validator/app/src/compute_horde_validator/validator/organic_jobs/miner_driver_sync.py

+        )
+        return DriverState.FAILED
+
+    def _wait_for_message(


seems that this implementation is very robust and resistant against weird messages, out of order messages etc. but I think it is not currently tested against that, is it? We've previously had malicious miners gaining some edge by sending out of order messages.

mpnowacki-reef

The implementation looks functionally correct, but SyncOrganicJobDriver at ~800 lines is doing a lot of heavy lifting. What do you think about splitting it? A few ideas:

Message handling - The logic in _wait_for_message that handles different error types (lines 477-537) could live in MinerClient or a separate helper. The client already knows how to send/receive, it could also know how to classify responses.

Receipt operations - Receipt creation and sending is scattered across _reserve_executor, _send_job_accepted_receipt, _collect_results, and _send_job_finished_receipt_if_failed. A small ReceiptService or even just a few standalone functions could clean this up.

Failure handling - _handle_horde_failure, _handle_decline_job, _handle_job_failed, _handle_horde_failed (~170 lines) are cohesive and could be extracted.

This would leave the driver as a ~300-400 line state machine focused on orchestration.
Also, since this is becoming a critical path, could we add some docstrings to the main flow? Specifically:

A class-level docstring explaining the state machine and the job lifecycle stages
Brief docs on _run() and _wait_for_message() explaining how the state transitions work

The state enum is already self-documenting which is great, just want to make sure the next person can follow the flow quickly.

github-actions · 2026-02-04T10:51:56Z

Commit message lint report

gitlint --commits 32fa9d141e05b817a908d9e42405016e881d765e..88df5c766905761d8504e3aac204a1e7bfadf032

Status: ❌ Issues found.

gitlint output:

Linting commit range: 32fa9d141e05b817a908d9e42405016e881d765e..88df5c766905761d8504e3aac204a1e7bfadf032
gitlint --commits "32fa9d141e05b817a908d9e42405016e881d765e..88df5c766905761d8504e3aac204a1e7bfadf032"
Commit 88df5c7669:
1: CT1 Title does not start with one of fix, feat, perf, doc, ci, style, refactor, chore, build, test: "squash me: add docstrings"
1: UR1 Commit message is missing an 'Impacts:' footer. Add a line like: Impacts: validator, miner
3: B6 Body message is missing

Commit 8b5ec64228:
1: CT1 Title does not start with one of fix, feat, perf, doc, ci, style, refactor, chore, build, test: "squash me: fix manifest fetching order"
1: UR1 Commit message is missing an 'Impacts:' footer. Add a line like: Impacts: validator, miner
3: B6 Body message is missing
Gitlint found issues in the commit messages.

Fix by amending/rewording your commits (e.g., using git rebase -i and git commit --amend).

kkowalski-reef · 2026-02-04T21:30:12Z

I don't think there's much that can be done here to lower the LoC. We are passing around a lot of context into messages, events, error reports etc and extracting anything means that you still have to pass that stuff into the helper, or if you instead give it the whole job driver instance, this is just method calling with extra steps. Some state methods are literally just "send a message" so not much to gain here. Maybe some methods could be split into less of "how to build a receipt object" and more of "the receipt should be sent now". But IMO looking at the code of the state hanlers, on average it's not worth it. I'd rather have code related to a state be in one place, no single method is hard to comprehend as-is.

Still, I'm not a fan of the state handler deciding the next state. I would maybe structure the "next state decision" and error handling a bit differently, e.g. states are for the happy path, otherwise throw exceptions. I looked into it and it seems I already did something like this for the async driver, there are generic exceptions like HordeError and more meaningful exceptions like "MinerRejectedJob", "MinerReportedJobFailed" etc, makes it so that:

you have a wrapped high level "happy path" list of steps
a separate list of "except if this happens, that's how to report the problem before exiting cleanly"
there's no need to bubble up errors, gets rid of the if isinstance(result, DriverState) checks
these exception classes already contain a lot of useful context to report so could make it easier to handle them in a generic way (see the error handling in the async driver)

But this is not critical.

kkowalski-reef · 2026-02-04T15:03:19Z

validator/app/src/compute_horde_validator/validator/organic_jobs/miner_driver_sync.py

+        log_exc_info: bool = False,
+    ) -> DriverState:
+        logger.warning(comment, exc_info=log_exc_info)
+        self.job.status = OrganicJob.Status.FAILED


Suggested change

self.job.status = OrganicJob.Status.FAILED

self.job.status = OrganicJob.Status.HORDE_FAILED

It worries me that no test caught this

kkowalski-reef · 2026-02-04T15:36:12Z

validator/app/src/compute_horde_validator/validator/organic_jobs/miner_driver_sync.py

+                )
+
+            try:
+                msg = self.miner_client.recv()


Can we pass in the remaining timeout to recv? There is this timer utility that you can set up before the loop and do a ~ recv(timeout=deadline.time_left()). Otherwise we're only checking for the timeout every time recv returns with no response which seems to be every 5 seconds. The job timing is strict and I'm not even sure how this will influence it but I can imagine we may end up with one stage consuming the time of a subsequent stage or something like that, not fun to debug.

kkowalski-reef · 2026-02-04T15:39:27Z

validator/app/src/compute_horde_validator/validator/organic_jobs/miner_driver_sync.py

+            # Handle legacy messages by mapping them to V0HordeFailedRequest
+            if isinstance(msg, V0StreamingJobNotReadyRequest):
+                msg = V0HordeFailedRequest(
+                    job_uuid=msg.job_uuid,
+                    reported_by=JobParticipantType.MINER,
+                    reason=HordeFailureReason.STREAMING_FAILED,
+                    message="Executor reported legacy V0StreamingJobNotReadyRequest message",
+                )
+            elif isinstance(msg, V0ExecutorFailedRequest):
+                msg = V0HordeFailedRequest(
+                    job_uuid=msg.job_uuid,
+                    reported_by=JobParticipantType.MINER,
+                    reason=HordeFailureReason.GENERIC_ERROR,
+                    message="Executor reported legacy V0ExecutorFailedRequest message",
+                )


These are not being sent any more. This was only needed for the release of the error reporting. You can safely remove this.

kkowalski-reef · 2026-02-04T15:56:39Z

validator/app/src/compute_horde_validator/validator/organic_jobs/miner_driver_sync.py

+                    assert_never(self._state)
+
+    def _connect(self) -> DriverState:
+        assert self._state == DriverState.CONNECT


Any reason for this defensive pattern throughout the driver?

github-advanced-security bot found potential problems Jan 28, 2026

View reviewed changes

validator/app/src/compute_horde_validator/validator/job_excuses.py Dismissed Show dismissed Hide dismissed

emnoor-reef force-pushed the sync-organic2 branch 3 times, most recently from 0169a3b to ee9a948 Compare January 28, 2026 13:33

emnoor-reef changed the title ~~Sync organic2~~ feat: make organic job execution flow sync Jan 28, 2026

emnoor-reef force-pushed the sync-organic2 branch 2 times, most recently from a128a9a to cc7522a Compare January 29, 2026 12:51

feat: make organic job execution flow sync

ec74ed0

Impacts: validator

emnoor-reef force-pushed the sync-organic2 branch from cc7522a to ec74ed0 Compare January 29, 2026 13:25

emnoor-reef marked this pull request as ready for review January 29, 2026 14:08

mpnowacki-reef reviewed Feb 1, 2026

View reviewed changes

validator/app/src/compute_horde_validator/validator/job_excuses.py Outdated Show resolved Hide resolved

mpnowacki-reef reviewed Feb 1, 2026

View reviewed changes

mpnowacki-reef requested changes Feb 1, 2026

View reviewed changes

emnoor-reef added 2 commits February 2, 2026 20:01

squash me: fix manifest fetching order

8b5ec64

squash me: add docstrings

88df5c7

kkowalski-reef requested changes Feb 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: make organic job execution flow sync#835

feat: make organic job execution flow sync#835
emnoor-reef wants to merge 3 commits intomasterfrom
sync-organic2

emnoor-reef commented Jan 28, 2026

Uh oh!

Uh oh!

Uh oh!

mpnowacki-reef Feb 1, 2026

Uh oh!

emnoor-reef Feb 2, 2026

Uh oh!

mpnowacki-reef Feb 1, 2026

Uh oh!

mpnowacki-reef Feb 1, 2026

Uh oh!

mpnowacki-reef left a comment

Uh oh!

github-actions bot commented Feb 4, 2026

Uh oh!

kkowalski-reef commented Feb 4, 2026 •

edited

Loading

Uh oh!

kkowalski-reef Feb 4, 2026

Uh oh!

kkowalski-reef Feb 4, 2026

Uh oh!

kkowalski-reef Feb 4, 2026

Uh oh!

kkowalski-reef Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

		)


		@pytest.mark.django_db(databases=["default", "default_alias"], transaction=True)

	self.job.status = OrganicJob.Status.FAILED
	self.job.status = OrganicJob.Status.HORDE_FAILED

Conversation

emnoor-reef commented Jan 28, 2026

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mpnowacki-reef left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 4, 2026

Commit message lint report

Uh oh!

kkowalski-reef commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

kkowalski-reef commented Feb 4, 2026 •

edited

Loading