On the Search for Feedback in Reinforcement Learning
- URL: http://arxiv.org/abs/2002.09478v6
- Date: Thu, 24 Mar 2022 01:29:19 GMT
- Title: On the Search for Feedback in Reinforcement Learning
- Authors: Ran Wang, Karthikeya S. Parunandi, Aayushman Sharma, Raman Goyal,
Suman Chakravorty
- Abstract summary: We advocate searching over a local feedback representation consisting of an open-loop sequence, and an associated optimal linear feedback law completely determined by the open-loop.
We show that this alternate approach results in highly efficient training, the answers obtained are repeatable and reliable, and the resulting closed performance is superior to global state-of-the-art RL techniques.
- Score: 6.29295842374861
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The problem of Reinforcement Learning (RL) in an unknown nonlinear dynamical
system is equivalent to the search for an optimal feedback law utilizing the
simulations/ rollouts of the dynamical system. Most RL techniques search over a
complex global nonlinear feedback parametrization making them suffer from high
training times as well as variance. Instead, we advocate searching over a local
feedback representation consisting of an open-loop sequence, and an associated
optimal linear feedback law completely determined by the open-loop. We show
that this alternate approach results in highly efficient training, the answers
obtained are repeatable and hence reliable, and the resulting closed
performance is superior to global state-of-the-art RL techniques. Finally, if
we replan, whenever required, which is feasible due to the fast and reliable
local solution, it allows us to recover global optimality of the resulting
feedback law.
Related papers
- Umbrella Reinforcement Learning -- computationally efficient tool for hard non-linear problems [0.0]
The approach is realized on the basis of neural networks, with the use of policy gradient.
It outperforms, by computational efficiency and implementation universality, all available state-of-the-art algorithms, in application to hard RL problems with sparse reward, state traps and lack of terminal states.
arXiv Detail & Related papers (2024-11-21T13:34:36Z) - Preference Elicitation for Offline Reinforcement Learning [59.136381500967744]
We propose Sim-OPRL, an offline preference-based reinforcement learning algorithm.
Our algorithm employs a pessimistic approach for out-of-distribution data, and an optimistic approach for acquiring informative preferences about the optimal policy.
arXiv Detail & Related papers (2024-06-26T15:59:13Z) - Hybrid Inverse Reinforcement Learning [34.793570631021005]
inverse reinforcement learning approach to imitation learning is a double-edged sword.
We propose using hybrid RL -- training on a mixture of online and expert data -- to curtail unnecessary exploration.
We derive both model-free and model-based hybrid inverse RL algorithms with strong policy performance guarantees.
arXiv Detail & Related papers (2024-02-13T23:29:09Z) - Posterior Sampling with Delayed Feedback for Reinforcement Learning with
Linear Function Approximation [62.969796245827006]
Delayed-PSVI is an optimistic value-based algorithm that explores the value function space via noise perturbation with posterior sampling.
We show our algorithm achieves $widetildeO(sqrtd3H3 T + d2H2 E[tau]$ worst-case regret in the presence of unknown delays.
We incorporate a gradient-based approximate sampling scheme via Langevin dynamics for Delayed-LPSVI.
arXiv Detail & Related papers (2023-10-29T06:12:43Z) - Query-Dependent Prompt Evaluation and Optimization with Offline Inverse
RL [62.824464372594576]
We aim to enhance arithmetic reasoning ability of Large Language Models (LLMs) through zero-shot prompt optimization.
We identify a previously overlooked objective of query dependency in such optimization.
We introduce Prompt-OIRL, which harnesses offline inverse reinforcement learning to draw insights from offline prompting demonstration data.
arXiv Detail & Related papers (2023-09-13T01:12:52Z) - Query-Policy Misalignment in Preference-Based Reinforcement Learning [21.212703100030478]
We show that the seemingly informative queries selected to improve the overall quality of reward model actually may not align with RL agents' interests.
We show that this issue can be effectively addressed via near on-policy query and a specially designed hybrid experience replay.
Our method achieves substantial gains in both human feedback and RL sample efficiency.
arXiv Detail & Related papers (2023-05-27T07:55:17Z) - Provably Efficient Representation Selection in Low-rank Markov Decision
Processes: From Online to Offline RL [84.14947307790361]
We propose an efficient algorithm, called ReLEX, for representation learning in both online and offline reinforcement learning.
We show that the online version of ReLEX, called Re-UCB, always performs no worse than the state-of-the-art algorithm without representation selection.
For the offline counterpart, ReLEX-LCB, we show that the algorithm can find the optimal policy if the representation class can cover the state-action space.
arXiv Detail & Related papers (2021-06-22T17:16:50Z) - Sparse Signal Reconstruction for Nonlinear Models via Piecewise Rational
Optimization [27.080837460030583]
We propose a method to reconstruct degraded signals by a nonlinear distortion and at a limited sampling rate.
Our method formulates as a non exact fitting term and a penalization term.
It is shown how to use the problem in terms of the benefits of our simulations.
arXiv Detail & Related papers (2020-10-29T09:05:19Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.