Related papers: Improve the efficiency of deep reinforcement learning through semantic exploration guided by natural language

Improve the efficiency of deep reinforcement learning through semantic exploration guided by natural language

URL: http://arxiv.org/abs/2309.11753v1
Date: Thu, 21 Sep 2023 03:25:35 GMT
Title: Improve the efficiency of deep reinforcement learning through semantic exploration guided by natural language
Authors: Zhourui Guo, Meng Yao, Yang Yu, Qiyue Yin
Abstract summary: We propose a novel method for interacting with the oracle in a selective and efficient way, using a retrieval-based approach. We use a neural network to encode the current state of the agent and the oracle, and retrieve the most relevant question from the corpus to ask the oracle. We show that our method can significantly improve the efficiency of RL by reducing the number of interactions needed to reach a certain level of performance.
Score: 10.47685316733524
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning is a powerful technique for learning from trial and error, but it often requires a large number of interactions to achieve good performance. In some domains, such as sparse-reward tasks, an oracle that can provide useful feedback or guidance to the agent during the learning process is really of great importance. However, querying the oracle too frequently may be costly or impractical, and the oracle may not always have a clear answer for every situation. Therefore, we propose a novel method for interacting with the oracle in a selective and efficient way, using a retrieval-based approach. We assume that the interaction can be modeled as a sequence of templated questions and answers, and that there is a large corpus of previous interactions available. We use a neural network to encode the current state of the agent and the oracle, and retrieve the most relevant question from the corpus to ask the oracle. We then use the oracle's answer to update the agent's policy and value function. We evaluate our method on an object manipulation task. We show that our method can significantly improve the efficiency of RL by reducing the number of interactions needed to reach a certain level of performance, compared to baselines that do not use the oracle or use it in a naive way.

Related papers

Necessary and Sufficient Oracles: Toward a Computational Taxonomy For Reinforcement Learning [28.184175745050474]
We study the impact of the choice of supervised learning oracle on the computational complexity of reinforcement learning algorithms. First, we identify two-context regression as a minimal oracle in the standard episodic access model. Second, we identify one-context regression as a near-minimal oracle in the stronger reset access model. Third, we broaden our focus to Low-Rank MDPs, where we give cryptographic evidence that the analogous oracle from the Block MDP setting is insufficient.
arXiv Detail & Related papers (2025-02-12T18:47:13Z)
Likelihood as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
We show that likelihoods serve as an effective gauge for language model performance. We propose two methods that use question likelihood as a gauge for selecting and constructing prompts that lead to better performance.
arXiv Detail & Related papers (2024-11-12T13:14:09Z)
Oracle problems as communication tasks and optimization of quantum algorithms [0.0]
A key question is how well an algorithm can succeed with a learning task using only a fixed number of queries. In this work, we propose measuring an algorithm's performance using the mutual information between the output and the actual value.
arXiv Detail & Related papers (2024-09-23T21:03:39Z)
Is Efficient PAC Learning Possible with an Oracle That Responds 'Yes' or 'No'? [26.334900941196082]
We investigate whether the ability to perform ERM, which computes a hypothesis minimizing empirical risk on a given dataset, is necessary for efficient learning. We show that in real setting of PAC for binary classification, a concept class can be learned using an oracle which only returns a single bit. Our results extend to the learning setting with a slight strengthening of the oracle, as well as to the partial concept, multiclass and real-valued learning settings.
arXiv Detail & Related papers (2024-06-17T15:50:08Z)
RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning. Our proposed method uses reinforcement learning with user intervention signals themselves as rewards. This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z)
Cache & Distil: Optimising API Calls to Large Language Models [82.32065572907125]
Large-scale deployment of generative AI tools often depends on costly API calls to a Large Language Model (LLM) to fulfil user queries. To curtail the frequency of these calls, one can employ a smaller language model -- a student. This student gradually gains proficiency in independently handling an increasing number of user requests.
arXiv Detail & Related papers (2023-10-20T15:01:55Z)
Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading [63.93888816206071]
We introduce MemWalker, a method that processes the long context into a tree of summary nodes. Upon receiving a query, the model navigates this tree in search of relevant information, and responds once it gathers sufficient information. We show that, beyond effective reading, MemWalker enhances explainability by highlighting the reasoning steps as it interactively reads the text; pinpointing the relevant text segments related to the query.
arXiv Detail & Related papers (2023-10-08T06:18:14Z)
Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled Datasets [73.2096288987301]
We propose a simple approach that uses a small amount of downstream expert data to selectively query relevant behaviors from an offline, unlabeled dataset. We observe that our method learns to query only the relevant transitions to the task, filtering out sub-optimal or task-irrelevant data. Our simple querying approach outperforms more complex goal-conditioned methods by 20% across simulated and real robotic manipulation tasks from images.
arXiv Detail & Related papers (2023-04-18T05:42:53Z)
On Efficient Approximate Queries over Machine Learning Models [30.26180913049285]
We develop a novel unified framework for approximate query answering by leveraging a proxy to minimize the oracle usage. Our framework uses a judicious combination of invoking the expensive oracle on data samples and applying the cheap proxy on the objects in the DB. Our algorithms outperform the state-of-the-art and achieve high result quality with provable statistical guarantees.
arXiv Detail & Related papers (2022-06-06T18:35:19Z)
PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning. We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z)
CoDE: Collocation for Demonstration Encoding [31.220899638271856]
We present a data-efficient imitation learning technique called Collocation for Demonstration. We circumvent problematic back-propagation through time problems by introducing an auxiliary trajectory trajectory taking inspiration from collocation techniques in optimal control. We present experiments on a 7-degree-of-freedom robotic manipulator learning behavior shaping policies for efficient tabletop operation.
arXiv Detail & Related papers (2021-05-07T00:34:43Z)
Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials. We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.