NLHF A simple code for Nash Learning from Human Feedback Note that this training code works for a small preference dataset from stanford, so try it out and run the training code if you feel interested. Credit: @BojanFaletic, @Hong, Claude3 and GPT4