Model-free (reinforcement learning)
Appearance
This article needs additional citations for verification. (April 2019) |
Part of a series on |
Machine learning and data mining |
---|
In reinforcement learning (RL), a model-free algorithm (as opposed to a model-based one) is an algorithm which does not estimate the transition probability distribution (and the reward function) associated with the Markov decision process (MDP),[1] which, in RL, represents the problem to be solved. The transition probability distribution (or transition model) and the reward function are often collectively called the "model" of the environment (or MDP), hence the name "model-free". A model-free RL algorithm can be thought of as an "explicit" trial-and-error algorithm.[1] An example of a model-free algorithm is Q-learning.
Key 'Model-Free' reinforcement learning algorithms
Algorithm | Description | Model | Policy | Action Space | State Space | Operator |
---|---|---|---|---|---|---|
DQN | Deep Q Network | Model-Free | Off-policy | Discrete | Continuous | Q-value |
DDPG | Deep Deterministic Policy Gradient | Model-Free | Off-policy | Continuous | Continuous | Q-value |
A3C | Asynchronous Advantage Actor-Critic Algorithm | Model-Free | On-policy | Continuous | Continuous | Advantage |
TRPO | Trust Region Policy Optimization | Model-Free | On-policy | Continuous | Continuous | Advantage |
PPO | Proximal Policy Optimization | Model-Free | On-policy | Continuous or Discrete | Continuous | Advantage |
TD3 | Twin Delayed Deep Deterministic Policy Gradient | Model-Free | Off-policy | Continuous | Continuous | Q-value |
SAC | Soft Actor-Critic | Model-Free | Off-policy | Continuous | Continuous | Advantage |
References
- ^ a b Sutton, Richard S.; Barto, Andrew G. (November 13, 2018). Reinforcement Learning: An Introduction (PDF) (Second ed.). A Bradford Book. p. 552. ISBN 0262039249. Retrieved 18 February 2019.