Hi, thanks for the great work on this project!
I'm currently training the denoising diffusion network. During training, the model reports two losses: base loss and mask loss. I’ve noticed that while the base loss steadily decreases, the mask loss seems to increase over time. Is this behavior expected?