Jump to content

Model-free (reinforcement learning)

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Valizadegan (talk | contribs) at 09:52, 28 February 2021 (Key 'Model-Free' reinforcement learning algorithms). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In reinforcement learning (RL), a model-free algorithm (as opposed to a model-based one) is an algorithm which does not use the transition probability distribution (and the reward function) associated with the Markov decision process (MDP),[1] which, in RL, represents the problem to be solved. The transition probability distribution (or transition model) and the reward function are often collectively called the "model" of the environment (or MDP), hence the name "model-free". A model-free RL algorithm can be thought of as an "explicit" trial-and-error algorithm.[1] An example of a model-free algorithm is Q-learning.

Key 'Model-Free' reinforcement learning algorithms

Algorithm Description Model Policy Action Space State Space Operator
DQN Deep Q Network Model-Free Off-policy Discrete Continuous Q-value
DDPG Deep Deterministic Policy Gradient Model-Free Off-policy Continuous Continuous Q-value
A3C Asynchronous Advantage Actor-Critic Algorithm Model-Free On-policy Continuous Continuous Advantage
TRPO Trust Region Policy Optimization Model-Free On-policy Continuous Continuous Advantage
PPO Proximal Policy Optimization Model-Free On-policy Continuous Continuous Advantage
TD3 Twin Delayed Deep Deterministic Policy Gradient Model-Free Off-policy Continuous Continuous Q-value
SAC Soft Actor-Critic Model-Free Off-policy Continuous Continuous Advantage

References

  1. ^ a b Sutton, Richard S.; Barto, Andrew G. (November 13, 2018). Reinforcement Learning: An Introduction (PDF) (Second ed.). A Bradford Book. p. 552. ISBN 0262039249. Retrieved 18 February 2019.