-
Notifications
You must be signed in to change notification settings - Fork 13
Description
When i train my own FWI diffusion prior via "accelerate launch --multi_gpu train.py -cn /configs/pretrain/fwi.yaml", l encountered the following problems:
Traceback (most recent call last):
File "/mnt/src/InverseBench-main/train.py", line 100, in main
loss = loss_fn(net, imgs)
^^^^^^^^^^^^^^^^^^
File "/mnt/src/InverseBench-main/training/loss.py", line 64, in call
rnd_normal = torch.randn([images.shape[0], 1, 1, 1], device=images.device)
^^^^^^^^^^^^
AttributeError: 'dict' object has no attribute 'shape'
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Traceback (most recent call last):
File "/home/user/miniconda3/envs/ibench/bin/accelerate", line 7, in
sys.exit(main())
^^^^^^
File "/home/user/miniconda3/envs/ibench/lib/python3.11/site-packages/accelerate/commands/accelerate_cli.py", line 50, in main
args.func(args)
File "/home/user/miniconda3/envs/ibench/lib/python3.11/site-packages/accelerate/commands/launch.py", line 1281, in launch_command
simple_launcher(args)
File "/home/user/miniconda3/envs/ibench/lib/python3.11/site-packages/accelerate/commands/launch.py", line 869, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/user/miniconda3/envs/ibench/bin/python3.11', 'train.py', 'log.wandb=false']' returned non-zero exit status 1.
I think the imgs is the dict type, and the class LMDBData(Dataset) return {'target': img}, so I tried to make the following changes in the train.py file
# training loop
for e in range(num_epochs):
# for imgs in dataloader:
#add
for batch in dataloader:
if training_steps >= config.train.num_steps:
break
##add
imgs=batch['target']
and I solved the problem. I don 't know if this is a bug.