Balancing Reinforcement Learning Training Experiences in Interactive
Information Retrieval
- URL: http://arxiv.org/abs/2006.03185v2
- Date: Wed, 9 Jun 2021 01:41:34 GMT
- Title: Balancing Reinforcement Learning Training Experiences in Interactive
Information Retrieval
- Authors: Limin Chen, Zhiwen Tang, Grace Hui Yang
- Abstract summary: Interactive Information Retrieval (IIR) and Reinforcement Learning (RL) share many commonalities, including an agent who learns while interacting.
To successfully apply RL methods to IIR, one challenge is to obtain sufficient relevance labels to train the RL agents.
Our paper addresses this issue by using domain randomization to synthesize more relevant documents for the training.
- Score: 19.723551683930776
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Interactive Information Retrieval (IIR) and Reinforcement Learning (RL) share
many commonalities, including an agent who learns while interacts, a long-term
and complex goal, and an algorithm that explores and adapts. To successfully
apply RL methods to IIR, one challenge is to obtain sufficient relevance labels
to train the RL agents, which are infamously known as sample inefficient.
However, in a text corpus annotated for a given query, it is not the relevant
documents but the irrelevant documents that predominate. This would cause very
unbalanced training experiences for the agent and prevent it from learning any
policy that is effective. Our paper addresses this issue by using domain
randomization to synthesize more relevant documents for the training. Our
experimental results on the Text REtrieval Conference (TREC) Dynamic Domain
(DD) 2017 Track show that the proposed method is able to boost an RL agent's
learning effectiveness by 22\% in dealing with unseen situations.
Related papers
- Hybrid Inverse Reinforcement Learning [34.793570631021005]
inverse reinforcement learning approach to imitation learning is a double-edged sword.
We propose using hybrid RL -- training on a mixture of online and expert data -- to curtail unnecessary exploration.
We derive both model-free and model-based hybrid inverse RL algorithms with strong policy performance guarantees.
arXiv Detail & Related papers (2024-02-13T23:29:09Z) - Flexible Attention-Based Multi-Policy Fusion for Efficient Deep
Reinforcement Learning [78.31888150539258]
Reinforcement learning (RL) agents have long sought to approach the efficiency of human learning.
Prior studies in RL have incorporated external knowledge policies to help agents improve sample efficiency.
We present Knowledge-Grounded RL (KGRL), an RL paradigm fusing multiple knowledge policies and aiming for human-like efficiency and flexibility.
arXiv Detail & Related papers (2022-10-07T17:56:57Z) - FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations [52.85536740465277]
FIRE is a framework that adapts to rare events by training a RL policy in an edge computing digital twin environment.
We propose ImRE, an importance sampling-based Q-learning algorithm, which samples rare events proportionally to their impact on the value function.
We show that FIRE reduces costs compared to vanilla RL and the greedy baseline in the event of failures.
arXiv Detail & Related papers (2022-09-28T19:49:39Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior.
The retrieval process is trained to retrieve information from the dataset that may be useful in the current context.
We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Believe What You See: Implicit Constraint Approach for Offline
Multi-Agent Reinforcement Learning [16.707045765042505]
Current offline RL algorithms are ineffective in multi-agent systems due to the accumulated extrapolation error.
We propose a novel offline RL algorithm, named Implicit Constraint Q-learning (ICQ), which effectively alleviates the extrapolation error.
Experimental results demonstrate that the extrapolation error is reduced to almost zero and insensitive to the number of agents.
arXiv Detail & Related papers (2021-06-07T08:02:31Z) - Causal-aware Safe Policy Improvement for Task-oriented dialogue [45.88777832381149]
We propose a batch RL framework for task oriented dialogue policy learning: causal safe policy improvement (CASPI)
We demonstrate the effectiveness of this framework on a dialogue-context-to-text Generation and end-to-end dialogue task of the Multiwoz2.0 dataset.
arXiv Detail & Related papers (2021-03-10T22:34:28Z) - A novel policy for pre-trained Deep Reinforcement Learning for Speech
Emotion Recognition [8.175197257598697]
Reinforcement Learning (RL) is a semi-supervised learning paradigm which an agent learns by interacting with an environment.
Deep RL has gained tremendous success in gaming - such as AlphaGo, but its potential have rarely been explored for challenging tasks like Speech Emotion Recognition (SER)
In this paper, we introduce a novel policy - "Zeta policy" which is tailored for SER and apply Pre-training in deep RL to achieve faster learning rate.
arXiv Detail & Related papers (2021-01-04T02:13:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.