Minimax-Optimal Reward-Agnostic Exploration in Reinforcement Learning

Li, Gen; Yan, Yuling; Chen, Yuxin; Fan, Jianqing

Computer Science > Machine Learning

arXiv:2304.07278v1 (cs)

[Submitted on 14 Apr 2023 (this version), latest version 23 May 2024 (v2)]

Title:Minimax-Optimal Reward-Agnostic Exploration in Reinforcement Learning

Authors:Gen Li, Yuling Yan, Yuxin Chen, Jianqing Fan

View PDF

Abstract:This paper studies reward-agnostic exploration in reinforcement learning (RL) -- a scenario where the learner is unware of the reward functions during the exploration stage -- and designs an algorithm that improves over the state of the art. More precisely, consider a finite-horizon non-stationary Markov decision process with $S$ states, $A$ actions, and horizon length $H$, and suppose that there are no more than a polynomial number of given reward functions of interest. By collecting an order of \begin{align*}
\frac{SAH^3}{\varepsilon^2} \text{ sample episodes (up to log factor)} \end{align*} without guidance of the reward information, our algorithm is able to find $\varepsilon$-optimal policies for all these reward functions, provided that $\varepsilon$ is sufficiently small. This forms the first reward-agnostic exploration scheme in this context that achieves provable minimax optimality. Furthermore, once the sample size exceeds $\frac{S^2AH^3}{\varepsilon^2}$ episodes (up to log factor), our algorithm is able to yield $\varepsilon$ accuracy for arbitrarily many reward functions (even when they are adversarially designed), a task commonly dubbed as ``reward-free exploration.'' The novelty of our algorithm design draws on insights from offline RL: the exploration scheme attempts to maximize a critical reward-agnostic quantity that dictates the performance of offline RL, while the policy learning paradigm leverages ideas from sample-optimal offline RL paradigms.

Subjects:	Machine Learning (cs.LG); Information Theory (cs.IT); Systems and Control (eess.SY); Statistics Theory (math.ST); Machine Learning (stat.ML)
Cite as:	arXiv:2304.07278 [cs.LG]
	(or arXiv:2304.07278v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2304.07278

Submission history

From: Yuxin Chen [view email]
[v1] Fri, 14 Apr 2023 17:46:49 UTC (97 KB)
[v2] Thu, 23 May 2024 13:16:25 UTC (98 KB)

Computer Science > Machine Learning

Title:Minimax-Optimal Reward-Agnostic Exploration in Reinforcement Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Minimax-Optimal Reward-Agnostic Exploration in Reinforcement Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators