Currnetly, continuous PPO ratio is calculated as ratios = new_actor_log_probs/(actor_log_probs+EPSILON) this is derived from previous PPO algorithm ratio = actor_probs/(old_actor_probs+EPSILON) and that made sense as they were not log probabilites.
The line should actually be something like ratios = exp(new_actor_log_probs - actor_log_probs)