Improve the efficiency of deep reinforcement learning through semantic
exploration guided by natural language
- URL: http://arxiv.org/abs/2309.11753v1
- Date: Thu, 21 Sep 2023 03:25:35 GMT
- Title: Improve the efficiency of deep reinforcement learning through semantic
exploration guided by natural language
- Authors: Zhourui Guo, Meng Yao, Yang Yu, Qiyue Yin
- Abstract summary: We propose a novel method for interacting with the oracle in a selective and efficient way, using a retrieval-based approach.
We use a neural network to encode the current state of the agent and the oracle, and retrieve the most relevant question from the corpus to ask the oracle.
We show that our method can significantly improve the efficiency of RL by reducing the number of interactions needed to reach a certain level of performance.
- Score: 10.47685316733524
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning is a powerful technique for learning from trial and
error, but it often requires a large number of interactions to achieve good
performance. In some domains, such as sparse-reward tasks, an oracle that can
provide useful feedback or guidance to the agent during the learning process is
really of great importance. However, querying the oracle too frequently may be
costly or impractical, and the oracle may not always have a clear answer for
every situation. Therefore, we propose a novel method for interacting with the
oracle in a selective and efficient way, using a retrieval-based approach. We
assume that the interaction can be modeled as a sequence of templated questions
and answers, and that there is a large corpus of previous interactions
available. We use a neural network to encode the current state of the agent and
the oracle, and retrieve the most relevant question from the corpus to ask the
oracle. We then use the oracle's answer to update the agent's policy and value
function. We evaluate our method on an object manipulation task. We show that
our method can significantly improve the efficiency of RL by reducing the
number of interactions needed to reach a certain level of performance, compared
to baselines that do not use the oracle or use it in a naive way.
Related papers
- Oracle problems as communication tasks and optimization of quantum algorithms [0.0]
A key question is how well an algorithm can succeed with a learning task using only a fixed number of queries.
In this work, we propose measuring an algorithm's performance using the mutual information between the output and the actual value.
arXiv Detail & Related papers (2024-09-23T21:03:39Z) - Is Efficient PAC Learning Possible with an Oracle That Responds 'Yes' or 'No'? [26.334900941196082]
We investigate whether the ability to perform ERM, which computes a hypothesis minimizing empirical risk on a given dataset, is necessary for efficient learning.
We show that in real setting of PAC for binary classification, a concept class can be learned using an oracle which only returns a single bit.
Our results extend to the learning setting with a slight strengthening of the oracle, as well as to the partial concept, multiclass and real-valued learning settings.
arXiv Detail & Related papers (2024-06-17T15:50:08Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Cache & Distil: Optimising API Calls to Large Language Models [82.32065572907125]
Large-scale deployment of generative AI tools often depends on costly API calls to a Large Language Model (LLM) to fulfil user queries.
To curtail the frequency of these calls, one can employ a smaller language model -- a student.
This student gradually gains proficiency in independently handling an increasing number of user requests.
arXiv Detail & Related papers (2023-10-20T15:01:55Z) - Walking Down the Memory Maze: Beyond Context Limit through Interactive
Reading [63.93888816206071]
We introduce MemWalker, a method that processes the long context into a tree of summary nodes. Upon receiving a query, the model navigates this tree in search of relevant information, and responds once it gathers sufficient information.
We show that, beyond effective reading, MemWalker enhances explainability by highlighting the reasoning steps as it interactively reads the text; pinpointing the relevant text segments related to the query.
arXiv Detail & Related papers (2023-10-08T06:18:14Z) - Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled
Datasets [73.2096288987301]
We propose a simple approach that uses a small amount of downstream expert data to selectively query relevant behaviors from an offline, unlabeled dataset.
We observe that our method learns to query only the relevant transitions to the task, filtering out sub-optimal or task-irrelevant data.
Our simple querying approach outperforms more complex goal-conditioned methods by 20% across simulated and real robotic manipulation tasks from images.
arXiv Detail & Related papers (2023-04-18T05:42:53Z) - On Efficient Approximate Queries over Machine Learning Models [30.26180913049285]
We develop a novel unified framework for approximate query answering by leveraging a proxy to minimize the oracle usage.
Our framework uses a judicious combination of invoking the expensive oracle on data samples and applying the cheap proxy on the objects in the DB.
Our algorithms outperform the state-of-the-art and achieve high result quality with provable statistical guarantees.
arXiv Detail & Related papers (2022-06-06T18:35:19Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - CoDE: Collocation for Demonstration Encoding [31.220899638271856]
We present a data-efficient imitation learning technique called Collocation for Demonstration.
We circumvent problematic back-propagation through time problems by introducing an auxiliary trajectory trajectory taking inspiration from collocation techniques in optimal control.
We present experiments on a 7-degree-of-freedom robotic manipulator learning behavior shaping policies for efficient tabletop operation.
arXiv Detail & Related papers (2021-05-07T00:34:43Z) - Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials.
We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.