I think in the self._u() method of your PlanarFlow class, you have tf.math.sqrt(tf.reduce_sum(self.w ** 2.0))) but it should be just tf.reduce_sum(self.w ** 2.0)) as the paper uses the squared norm.