Skip to content

Submitting many tasks from Python fails due to serialization size limit #981

@lahwaacz

Description

@lahwaacz

I have an issue where submitting many Python functions in a single job fails. I found this simple reproducer:

import time

from hyperqueue import Job, LocalCluster

def work(*args, **kwargs):
    time.sleep(1)
    print(args)
    print(kwargs)

# Spawn a HQ server
with LocalCluster() as cluster:
    # Add a single HyperQueue worker to the server
    cluster.start_worker()

    # Create a client and a job
    client = cluster.client()
    job = Job()

    # Add a task graph
    tasks = {}
    for i in range(10000):
        name = f"work_{i}"
        stdout = f"{name}.stdout"
        stderr = f"{name}.stderr"
        tasks[name] = job.function(work, args=(["hello"] * 10000, i), kwargs=None, stdout=stdout, stderr=stderr, name=name)
        print(tasks[name].label)

    # Submit the job
    submitted = client.submit(job)

    # Wait until the job completes
    client.wait_for_jobs([submitted])
Traceback (most recent call last):
  File "/home/user/hq-test.py", line 29, in <module>
    submitted = client.submit(job)
  File "/usr/lib/python3.13/site-packages/hyperqueue/client.py", line 85, in submit
    job_id = self.connection.submit_job(job_desc)
  File "/usr/lib/python3.13/site-packages/hyperqueue/ffi/client.py", line 31, in submit_job
    return ffi.submit_job(self.ctx, job_description)
           ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception: TakoError(GenericError("Serialization failed: SizeLimit"))

It depends on the number of tasks in the job: 10000 fails, but if you lower it to e.g. 1000, it works as expected. The size of arguments for each task should be the same here, so the problem is not that the tasks are too big, but that they are serialized all together. This is very inconvenient for the user because it is hard to anticipate the serialization size (which can also be very dynamic). Can you improve the submit logic so that the necessary info is serialized and submitted by parts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions