Searching papers submitted to ICLR 2019 can be painful.
You might want to know which paper uses technique X, dataset D, or cites author ME.
Unfortunately, search is limited to titles, abstracts, and keywords, missing the actual contents of the paper.
This Frankensteinian search has returned from 2018 to help scour the papers of ICLR by ripping out their souls using
Good luck! Warranty's not included :)
Sanity Disclaimer: As you stare at the continuous stream of ICLR and arXiv papers, don't lose confidence or feel overwhelmed. This isn't a competition, it's a search for knowledge. You and your work are valuable and help carve out the path for progress in our field :)
"constrained markov decision problems" has 1 results
tl;dr Safe Reinforcement Learning Algorithms for Continuous Control (Show abstract)
In many reinforcement learning applications, it is crucial that the agent interacts with the environment only through safe policies, i.e.,~policies that do not take the agent to certain undesirable situations. These problems are often formulated as a constrained Markov decision process (CMDP) in which the agent's goal is to optimize its main objective while not violating a number of safety constraints. In this paper, we propose safe policy optimization algorithms that are based on the Lyapunov approach to CMDPs, an approach that has well-established theoretical guarantees in control engineering. We first show how to generate a set of state-dependent Lyapunov constraints from the original CMDP safety constraints. We then propose safe policy gradient algorithms that train a neural network policy using DDPG or PPO, while guaranteeing near-constraint satisfaction at every policy update by projecting either the policy parameter or the action onto the set of feasible solutions induced by the linearized Lyapunov constraints. Unlike the existing (safe) constrained PG algorithms, ours are more data efficient as they are able to utilize both on-policy and off-policy data. Furthermore, the action-projection version of our algorithms often leads to less conservative policy updates and allows for natural integration into an end-to-end PG training pipeline. We evaluate our algorithms and compare them with CPO and the Lagrangian method on several high-dimensional continuous state and action simulated robot locomotion tasks, in which the agent must satisfy certain safety constraints while minimizing its expected cumulative cost.