Skip to content

error in multi-gpu distributed training #12

@yppr

Description

@yppr

Hi, I can run the code with only one GPU. However, errors exist when I use the multi-gpu distributed training. The errors are listed as follows:
raceback (most recent call last):
File "train_spatial_query.py", line 538, in
train(args, loader, generator, discriminator, g_optim, d_optim, g_ema, device, tensorboard_writer, args.exp_name)
File "train_spatial_query.py", line 235, in train
fake_img, latents, mean_path_length
File "train_spatial_query.py", line 97, in g_path_regularize
grad, = autograd.grad(outputs=tmp, inputs=latents, create_graph=True)
File "/home/nrr/.conda/envs/stylegan/lib/python3.7/site-packages/torch/autograd/init.py", line 236, in grad
inputs, allow_unused, accumulate_grad=False)
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

I use four GTX3080 to train the model. The pytorch version is 1.10.2. Could you kindly help me solve the problem. Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions