Related papers: Contextual Information-Directed Sampling

Contextual Information-Directed Sampling

URL: http://arxiv.org/abs/2205.10895v1
Date: Sun, 22 May 2022 18:08:42 GMT
Title: Contextual Information-Directed Sampling
Authors: Botao Hao, Tor Lattimore, Chao Qin
Abstract summary: Information-directed sampling (IDS) has recently demonstrated its potential as a data-efficient reinforcement learning algorithm. We investigate the IDS design through two contextual bandit problems: contextual bandits with graph feedback and sparse linear contextual bandits. We provably demonstrate the advantage of contextual IDS over conditional IDS and emphasize the importance of considering the context distribution.
Score: 35.72522680827013
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Information-directed sampling (IDS) has recently demonstrated its potential as a data-efficient reinforcement learning algorithm. However, it is still unclear what is the right form of information ratio to optimize when contextual information is available. We investigate the IDS design through two contextual bandit problems: contextual bandits with graph feedback and sparse linear contextual bandits. We provably demonstrate the advantage of contextual IDS over conditional IDS and emphasize the importance of considering the context distribution. The main message is that an intelligent agent should invest more on the actions that are beneficial for the future unseen contexts while the conditional IDS can be myopic. We further propose a computationally-efficient version of contextual IDS based on Actor-Critic and evaluate it empirically on a neural network contextual bandit.

Related papers

On the Loss of Context-awareness in General Instruction Fine-tuning [101.03941308894191]
We investigate the loss of context awareness after supervised fine-tuning. We find that the performance decline is associated with a bias toward different roles learned during conversational instruction fine-tuning. We propose a metric to identify context-dependent examples from general instruction fine-tuning datasets.
arXiv Detail & Related papers (2024-11-05T00:16:01Z)
Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding [118.75567341513897]
Existing methods typically analyze target text in isolation or solely with non-member contexts. We propose Con-ReCall, a novel approach that leverages the asymmetric distributional shifts induced by member and non-member contexts.
arXiv Detail & Related papers (2024-09-05T09:10:38Z)
Enhancing AI-based Generation of Software Exploits with Contextual Information [9.327315119028809]
The study employs a dataset comprising real shellcodes to evaluate the models across various scenarios. The experiments are designed to assess the models' resilience against incomplete descriptions, their proficiency in leveraging context for enhanced accuracy, and their ability to discern irrelevant information. The models demonstrate an ability to filter out unnecessary context, maintaining high levels of accuracy in the generation of offensive security code.
arXiv Detail & Related papers (2024-08-05T11:52:34Z)
Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions [75.45274978665684]
Vision-Language Understanding (VLU) benchmarks contain samples where answers rely on assumptions unsupported by the provided context. We collect contextual data for each sample whenever available and train a context selection module to facilitate evidence-based model predictions. We develop a general-purpose Context-AwaRe Abstention detector to identify samples lacking sufficient context and enhance model accuracy.
arXiv Detail & Related papers (2024-05-18T02:21:32Z)
LLMs-augmented Contextual Bandit [7.578368459974475]
We propose a novel integration of large language models (LLMs) with the contextual bandit framework. Preliminary results on synthetic datasets demonstrate the potential of this approach.
arXiv Detail & Related papers (2023-11-03T23:12:57Z)
On the Powerfulness of Textual Outlier Exposure for Visual OoD Detection [41.277221429527515]
Outlier exposure introduces an additional loss that encourages low-confidence predictions on OoD data during training. This paper explores the benefits of using textual outliers by replacing real or virtual outliers in the image-domain with textual equivalents. Our experiments demonstrate that generated textual outliers achieve competitive performance on large-scale OoD and hard OoD benchmarks.
arXiv Detail & Related papers (2023-10-25T09:19:45Z)
Follow-ups Also Matter: Improving Contextual Bandits via Post-serving Contexts [31.33919659549256]
We present a novel contextual bandit problem with post-serving contexts. Our algorithm, poLinUCB, achieves tight regret under standard assumptions. Extensive empirical tests on both synthetic and real-world datasets demonstrate the significant benefit of utilizing post-serving contexts.
arXiv Detail & Related papers (2023-09-25T06:22:28Z)
Revisiting the Roles of "Text" in Text Games [102.22750109468652]
This paper investigates the roles of text in the face of different reinforcement learning challenges. We propose a simple scheme to extract relevant contextual information into an approximate state hash. Such a lightweight plug-in achieves competitive performance with state-of-the-art text agents.
arXiv Detail & Related papers (2022-10-15T21:52:39Z)
Out of Context: A New Clue for Context Modeling of Aspect-based Sentiment Analysis [54.735400754548635]
ABSA aims to predict the sentiment expressed in a review with respect to a given aspect. The given aspect should be considered as a new clue out of context in the context modeling process. We design several aspect-aware context encoders based on different backbones.
arXiv Detail & Related papers (2021-06-21T02:26:03Z)
Towards Accurate Scene Text Recognition with Semantic Reasoning Networks [52.86058031919856]
We propose a novel end-to-end trainable framework named semantic reasoning network (SRN) for accurate scene text recognition. GSRM is introduced to capture global semantic context through multi-way parallel transmission. Results on 7 public benchmarks, including regular text, irregular text and non-Latin long text, verify the effectiveness and robustness of the proposed method.
arXiv Detail & Related papers (2020-03-27T09:19:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.