error in multi-gpu distributed training

Hi, I can run the code with only one GPU. However, errors exist when I use the multi-gpu distributed training. The errors are listed as follows:
raceback (most recent call last):
  File "train_spatial_query.py", line 538, in <module>
    train(args, loader, generator, discriminator, g_optim, d_optim, g_ema, device, tensorboard_writer, args.exp_name)
  File "train_spatial_query.py", line 235, in train
    fake_img, latents, mean_path_length
  File "train_spatial_query.py", line 97, in g_path_regularize
    grad, = autograd.grad(outputs=tmp, inputs=latents, create_graph=True)
  File "/home/nrr/.conda/envs/stylegan/lib/python3.7/site-packages/torch/autograd/__init__.py", line 236, in grad
    inputs, allow_unused, accumulate_grad=False)
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

I use four GTX3080 to train the model. The pytorch version is 1.10.2. Could you kindly help me solve the problem. Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error in multi-gpu distributed training #12

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

error in multi-gpu distributed training #12

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions