-
Notifications
You must be signed in to change notification settings - Fork 40
Open
Description
I have an issue where submitting many Python functions in a single job fails. I found this simple reproducer:
import time
from hyperqueue import Job, LocalCluster
def work(*args, **kwargs):
time.sleep(1)
print(args)
print(kwargs)
# Spawn a HQ server
with LocalCluster() as cluster:
# Add a single HyperQueue worker to the server
cluster.start_worker()
# Create a client and a job
client = cluster.client()
job = Job()
# Add a task graph
tasks = {}
for i in range(10000):
name = f"work_{i}"
stdout = f"{name}.stdout"
stderr = f"{name}.stderr"
tasks[name] = job.function(work, args=(["hello"] * 10000, i), kwargs=None, stdout=stdout, stderr=stderr, name=name)
print(tasks[name].label)
# Submit the job
submitted = client.submit(job)
# Wait until the job completes
client.wait_for_jobs([submitted])Traceback (most recent call last):
File "/home/user/hq-test.py", line 29, in <module>
submitted = client.submit(job)
File "/usr/lib/python3.13/site-packages/hyperqueue/client.py", line 85, in submit
job_id = self.connection.submit_job(job_desc)
File "/usr/lib/python3.13/site-packages/hyperqueue/ffi/client.py", line 31, in submit_job
return ffi.submit_job(self.ctx, job_description)
~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception: TakoError(GenericError("Serialization failed: SizeLimit"))
It depends on the number of tasks in the job: 10000 fails, but if you lower it to e.g. 1000, it works as expected. The size of arguments for each task should be the same here, so the problem is not that the tasks are too big, but that they are serialized all together. This is very inconvenient for the user because it is hard to anticipate the serialization size (which can also be very dynamic). Can you improve the submit logic so that the necessary info is serialized and submitted by parts?
Metadata
Metadata
Assignees
Labels
No labels