Actor-critic versus direct policy search: a comparison based on sample complexity

de Broissia, Arnaud de Froissard; Sigaud, Olivier

Computer Science > Machine Learning

arXiv:1606.09152 (cs)

[Submitted on 29 Jun 2016 (v1), last revised 22 Aug 2016 (this version, v2)]

Title:Actor-critic versus direct policy search: a comparison based on sample complexity

Authors:Arnaud de Froissard de Broissia, Olivier Sigaud

View PDF

Abstract:Sample efficiency is a critical property when optimizing policy parameters for the controller of a robot. In this paper, we evaluate two state-of-the-art policy optimization algorithms. One is a recent deep reinforcement learning method based on an actor-critic algorithm, Deep Deterministic Policy Gradient (DDPG), that has been shown to perform well on various control benchmarks. The other one is a direct policy search method, Covariance Matrix Adaptation Evolution Strategy (CMA-ES), a black-box optimization method that is widely used for robot learning. The algorithms are evaluated on a continuous version of the mountain car benchmark problem, so as to compare their sample complexity. From a preliminary analysis, we expect DDPG to be more sample efficient than CMA-ES, which is confirmed by our experimental results.

Comments:	Proceedings JFPDA (Journees Francaises Planification Decision Apprentissage)
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:1606.09152 [cs.LG]
	(or arXiv:1606.09152v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1606.09152

Submission history

From: Olivier Sigaud [view email]
[v1] Wed, 29 Jun 2016 15:22:13 UTC (601 KB)
[v2] Mon, 22 Aug 2016 11:07:23 UTC (758 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2016-06

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Arnaud de Froissard de Broissia
Olivier Sigaud

export BibTeX citation

Computer Science > Machine Learning

Title:Actor-critic versus direct policy search: a comparison based on sample complexity

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Actor-critic versus direct policy search: a comparison based on sample complexity

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators