This paper concerns computation of optimal policies in which the one-step reward function contains a cost term that models Kullback-Leibler divergence with ...
This paper introduces a technique to solve a more general class of action-constrained MDPs. The main idea is to solve an entire parameterized family of MDPs, in ...
This paper introduces a technique to solve a more general class of action-constrained MDPs. The main idea is to solve an entire parameterized family of MDPs, in ...
This paper concerns computation of optimal policies in which the one-step reward function contains a cost term that models Kullback-Leibler divergence with ...
Abstract— We consider an online (real-time) control problem that involves an agent performing a discrete-time random walk over a finite state space.
Abstract—This paper considers an online (real-time) control problem that involves an agent performing a discrete-time random walk over a finite state space.
Constrained Markov decision processes with uncertain costs
www.researchgate.net › publication › 35...
We consider a finite state-action discounted constrained Markov decision process with uncertain running costs and known transition probabilities.
Constrained MDPs have been used historically to solve two major drawbacks of standard discrete MDPs: Multiple objectives and limited resources [7]–[14].
The trajectories of the states together with the choices of actions (or trajectories' distribution) determine the different costs. In order to clarify the type ...
Missing: Kullback- Leibler
[24] ——, “Action-Constrained Markov Decision Processes With. Kullback-Leibler Cost,” in Proceedings of the Conference on. Computational Learning Theory, and ...