Contextual Information-Directed Sampling
- URL: http://arxiv.org/abs/2205.10895v1
- Date: Sun, 22 May 2022 18:08:42 GMT
- Title: Contextual Information-Directed Sampling
- Authors: Botao Hao, Tor Lattimore, Chao Qin
- Abstract summary: Information-directed sampling (IDS) has recently demonstrated its potential as a data-efficient reinforcement learning algorithm.
We investigate the IDS design through two contextual bandit problems: contextual bandits with graph feedback and sparse linear contextual bandits.
We provably demonstrate the advantage of contextual IDS over conditional IDS and emphasize the importance of considering the context distribution.
- Score: 35.72522680827013
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Information-directed sampling (IDS) has recently demonstrated its potential
as a data-efficient reinforcement learning algorithm. However, it is still
unclear what is the right form of information ratio to optimize when contextual
information is available. We investigate the IDS design through two contextual
bandit problems: contextual bandits with graph feedback and sparse linear
contextual bandits. We provably demonstrate the advantage of contextual IDS
over conditional IDS and emphasize the importance of considering the context
distribution. The main message is that an intelligent agent should invest more
on the actions that are beneficial for the future unseen contexts while the
conditional IDS can be myopic. We further propose a computationally-efficient
version of contextual IDS based on Actor-Critic and evaluate it empirically on
a neural network contextual bandit.
Related papers
- Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions [75.45274978665684]
Vision-Language Understanding (VLU) benchmarks contain samples where answers rely on assumptions unsupported by the provided context.
We collect contextual data for each sample whenever available and train a context selection module to facilitate evidence-based model predictions.
We develop a general-purpose Context-AwaRe Abstention detector to identify samples lacking sufficient context and enhance model accuracy.
arXiv Detail & Related papers (2024-05-18T02:21:32Z) - LLMs-augmented Contextual Bandit [7.578368459974475]
We propose a novel integration of large language models (LLMs) with the contextual bandit framework.
Preliminary results on synthetic datasets demonstrate the potential of this approach.
arXiv Detail & Related papers (2023-11-03T23:12:57Z) - On the Powerfulness of Textual Outlier Exposure for Visual OoD Detection [41.277221429527515]
Outlier exposure introduces an additional loss that encourages low-confidence predictions on OoD data during training.
This paper explores the benefits of using textual outliers by replacing real or virtual outliers in the image-domain with textual equivalents.
Our experiments demonstrate that generated textual outliers achieve competitive performance on large-scale OoD and hard OoD benchmarks.
arXiv Detail & Related papers (2023-10-25T09:19:45Z) - Follow-ups Also Matter: Improving Contextual Bandits via Post-serving
Contexts [31.33919659549256]
We present a novel contextual bandit problem with post-serving contexts.
Our algorithm, poLinUCB, achieves tight regret under standard assumptions.
Extensive empirical tests on both synthetic and real-world datasets demonstrate the significant benefit of utilizing post-serving contexts.
arXiv Detail & Related papers (2023-09-25T06:22:28Z) - Revisiting the Roles of "Text" in Text Games [102.22750109468652]
This paper investigates the roles of text in the face of different reinforcement learning challenges.
We propose a simple scheme to extract relevant contextual information into an approximate state hash.
Such a lightweight plug-in achieves competitive performance with state-of-the-art text agents.
arXiv Detail & Related papers (2022-10-15T21:52:39Z) - InfoCSE: Information-aggregated Contrastive Learning of Sentence
Embeddings [61.77760317554826]
This paper proposes an information-d contrastive learning framework for learning unsupervised sentence embeddings, termed InfoCSE.
We evaluate the proposed InfoCSE on several benchmark datasets w.r.t the semantic text similarity (STS) task.
Experimental results show that InfoCSE outperforms SimCSE by an average Spearman correlation of 2.60% on BERT-base, and 1.77% on BERT-large.
arXiv Detail & Related papers (2022-10-08T15:53:19Z) - Out of Context: A New Clue for Context Modeling of Aspect-based
Sentiment Analysis [54.735400754548635]
ABSA aims to predict the sentiment expressed in a review with respect to a given aspect.
The given aspect should be considered as a new clue out of context in the context modeling process.
We design several aspect-aware context encoders based on different backbones.
arXiv Detail & Related papers (2021-06-21T02:26:03Z) - Towards Accurate Scene Text Recognition with Semantic Reasoning Networks [52.86058031919856]
We propose a novel end-to-end trainable framework named semantic reasoning network (SRN) for accurate scene text recognition.
GSRM is introduced to capture global semantic context through multi-way parallel transmission.
Results on 7 public benchmarks, including regular text, irregular text and non-Latin long text, verify the effectiveness and robustness of the proposed method.
arXiv Detail & Related papers (2020-03-27T09:19:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.