I'm not motivated with the optimization from row 55 to 65 of FM.py. Specifically, I did not understand (a) why should we culculate 'loss' in that way in row 59? (b) why should we update v, w and w0 in that way, which are totally different from pure SGD.
Thanks a lot for your help.
Yue