Tight Performance Bounds for Approximate Modified Policy Iteration with Non-Stationary Policies

Lesner, Boris; Scherrer, Bruno

Mathematics > Optimization and Control

arXiv:1304.5610 (math)

[Submitted on 20 Apr 2013]

Title:Tight Performance Bounds for Approximate Modified Policy Iteration with Non-Stationary Policies

Authors:Boris Lesner (INRIA Nancy - Grand Est / LORIA), Bruno Scherrer (INRIA Nancy - Grand Est / LORIA)

View PDF

Abstract:We consider approximate dynamic programming for the infinite-horizon stationary $\gamma$-discounted optimal control problem formalized by Markov Decision Processes. While in the exact case it is known that there always exists an optimal policy that is stationary, we show that when using value function approximation, looking for a non-stationary policy may lead to a better performance guarantee. We define a non-stationary variant of MPI that unifies a broad family of approximate DP algorithms of the literature. For this algorithm we provide an error propagation analysis in the form of a performance bound of the resulting policies that can improve the usual performance bound by a factor $O(1-\gamma)$, which is significant when the discount factor $\gamma$ is close to 1. Doing so, our approach unifies recent results for Value and Policy Iteration. Furthermore, we show, by constructing a specific deterministic MDP, that our performance guarantee is tight.

Subjects:	Optimization and Control (math.OC); Artificial Intelligence (cs.AI)
Cite as:	arXiv:1304.5610 [math.OC]
	(or arXiv:1304.5610v1 [math.OC] for this version)
	https://doi.org/10.48550/arXiv.1304.5610

Submission history

From: Bruno Scherrer [view email] [via CCSD proxy]
[v1] Sat, 20 Apr 2013 08:45:37 UTC (892 KB)

Mathematics > Optimization and Control

Title:Tight Performance Bounds for Approximate Modified Policy Iteration with Non-Stationary Policies

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Mathematics > Optimization and Control

Title:Tight Performance Bounds for Approximate Modified Policy Iteration with Non-Stationary Policies

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators