PPO-UE: Proximal Policy Optimization via Uncertainty-Aware Exploration

Q Zhang, Z Guo, A Jøsang, LM Kaplan, F Chen… - arXiv preprint arXiv …, 2022 - arxiv.org
arXiv preprint arXiv:2212.06343, 2022arxiv.org
Proximal Policy Optimization (PPO) is a highly popular policy-based deep reinforcement
learning (DRL) approach. However, we observe that the homogeneous exploration process
in PPO could cause an unexpected stability issue in the training phase. To address this
issue, we propose PPO-UE, a PPO variant equipped with self-adaptive uncertainty-aware
explorations (UEs) based on a ratio uncertainty level. The proposed PPO-UE is designed to
improve convergence speed and performance with an optimized ratio uncertainty level …
Proximal Policy Optimization (PPO) is a highly popular policy-based deep reinforcement learning (DRL) approach. However, we observe that the homogeneous exploration process in PPO could cause an unexpected stability issue in the training phase. To address this issue, we propose PPO-UE, a PPO variant equipped with self-adaptive uncertainty-aware explorations (UEs) based on a ratio uncertainty level. The proposed PPO-UE is designed to improve convergence speed and performance with an optimized ratio uncertainty level. Through extensive sensitivity analysis by varying the ratio uncertainty level, our proposed PPO-UE considerably outperforms the baseline PPO in Roboschool continuous control tasks.
arxiv.org