PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators

Agarwal, Anish; Alomar, Abdullah; Alumootil, Varkey; Shah, Devavrat; Shen, Dennis; Xu, Zhi; Yang, Cindy

Computer Science > Machine Learning

arXiv:2102.06961 (cs)

[Submitted on 13 Feb 2021 (v1), last revised 10 Nov 2021 (this version, v4)]

Title:PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators

Authors:Anish Agarwal, Abdullah Alomar, Varkey Alumootil, Devavrat Shah, Dennis Shen, Zhi Xu, Cindy Yang

View PDF

Abstract:We consider offline reinforcement learning (RL) with heterogeneous agents under severe data scarcity, i.e., we only observe a single historical trajectory for every agent under an unknown, potentially sub-optimal policy. We find that the performance of state-of-the-art offline and model-based RL methods degrade significantly given such limited data availability, even for commonly perceived "solved" benchmark settings such as "MountainCar" and "CartPole". To address this challenge, we propose PerSim, a model-based offline RL approach which first learns a personalized simulator for each agent by collectively using the historical trajectories across all agents, prior to learning a policy. We do so by positing that the transition dynamics across agents can be represented as a latent function of latent factors associated with agents, states, and actions; subsequently, we theoretically establish that this function is well-approximated by a "low-rank" decomposition of separable agent, state, and action latent functions. This representation suggests a simple, regularized neural network architecture to effectively learn the transition dynamics per agent, even with scarce, offline data. We perform extensive experiments across several benchmark environments and RL methods. The consistent improvement of our approach, measured in terms of both state dynamics prediction and eventual reward, confirms the efficacy of our framework in leveraging limited historical data to simultaneously learn personalized policies across agents.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2102.06961 [cs.LG]
	(or arXiv:2102.06961v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2102.06961

Submission history

From: Abdullah Alomar [view email]
[v1] Sat, 13 Feb 2021 17:16:41 UTC (5,619 KB)
[v2] Wed, 17 Mar 2021 15:54:16 UTC (5,852 KB)
[v3] Fri, 11 Jun 2021 20:00:03 UTC (5,220 KB)
[v4] Wed, 10 Nov 2021 17:39:16 UTC (10,526 KB)

Computer Science > Machine Learning

Title:PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators