hello!
Thank you for your great work and your idea is really excellent. But there are still some questions I don't understand.
In BCE loss and the gradient in your paper are formulated as the following respectively:

However, the formulation of BCE that I learned is like this:

So, the if i!=k, the loss yi brings would be zero.
Thanks a lot!
best wishes