As described in this paper, the noise(sigma) increases, the respected L(W) decreases. But if we understand sigma as uncertainty of y,
maybe it's better for L to be increase with uncertainty, because it means y is harder to learn, so it needs more attention to learn?