PPO ratio is incorrectly calculated

Currnetly, continuous PPO ratio is calculated as `ratios = new_actor_log_probs/(actor_log_probs+EPSILON)` this is derived from previous PPO algorithm `ratio = actor_probs/(old_actor_probs+EPSILON)` and that made sense as they were not **log probabilites**.

**The line should actually be something like `ratios = exp(new_actor_log_probs - actor_log_probs)`**