Double-Linear Thompson Sampling for Context-Attentive Bandits
- URL: http://arxiv.org/abs/2010.09473v1
- Date: Thu, 15 Oct 2020 13:01:19 GMT
- Title: Double-Linear Thompson Sampling for Context-Attentive Bandits
- Authors: Djallel Bouneffouf, Rapha\"el F\'eraud, Sohini Upadhyay, Yasaman
Khazaeni and Irina Rish
- Abstract summary: We analyze and extend an online learning framework known as Context-Attentive Bandit, motivated by various practical applications.
We derive a novel algorithm, called Context-Attentive Thompson Sampling (CATS), which builds upon the Linear Thompson Sampling approach, adapting it to Context-Attentive Bandit setting.
- Score: 27.786695164493562
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we analyze and extend an online learning framework known as
Context-Attentive Bandit, motivated by various practical applications, from
medical diagnosis to dialog systems, where due to observation costs only a
small subset of a potentially large number of context variables can be observed
at each iteration;however, the agent has a freedom to choose which variables to
observe. We derive a novel algorithm, called Context-Attentive Thompson
Sampling (CATS), which builds upon the Linear Thompson Sampling approach,
adapting it to Context-Attentive Bandit setting. We provide a theoretical
regret analysis and an extensive empirical evaluation demonstrating advantages
of the proposed approach over several baseline methods on a variety of
real-life datasets
Related papers
- Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation [60.493180081319785]
We propose a systematic way to estimate the intrinsic capacity of a truncation sampling method by considering the trade-off between diversity and risk at each decoding step.
Our work provides a comprehensive comparison between existing truncation sampling methods, as well as their recommended parameters as a guideline for users.
arXiv Detail & Related papers (2024-08-24T14:14:32Z) - Detecting Statements in Text: A Domain-Agnostic Few-Shot Solution [1.3654846342364308]
State-of-the-art approaches usually involve fine-tuning models on large annotated datasets, which are costly to produce.
We propose and release a qualitative and versatile few-shot learning methodology as a common paradigm for any claim-based textual classification task.
We illustrate this methodology in the context of three tasks: climate change contrarianism detection, topic/stance classification and depression-relates symptoms detection.
arXiv Detail & Related papers (2024-05-09T12:03:38Z) - Resilient Multiple Choice Learning: A learned scoring scheme with
application to audio scene analysis [8.896068269039452]
We introduce Resilient Multiple Choice Learning (rMCL) for conditional distribution estimation in regression settings.
rMCL is a simple framework to tackle multimodal density estimation, using the Winner-Takes-All (WTA) loss for a set of hypotheses.
arXiv Detail & Related papers (2023-11-02T07:54:03Z) - Analysis of Thompson Sampling for Partially Observable Contextual
Multi-Armed Bandits [1.8275108630751844]
We propose a Thompson Sampling algorithm for partially observable contextual multi-armed bandits.
We show that the regret of the presented policy scales logarithmically with time and the number of arms, and linearly with the dimension.
arXiv Detail & Related papers (2021-10-23T08:51:49Z) - An Investigation of Replay-based Approaches for Continual Learning [79.0660895390689]
Continual learning (CL) is a major challenge of machine learning (ML) and describes the ability to learn several tasks sequentially without catastrophic forgetting (CF)
Several solution classes have been proposed, of which so-called replay-based approaches seem very promising due to their simplicity and robustness.
We empirically investigate replay-based approaches of continual learning and assess their potential for applications.
arXiv Detail & Related papers (2021-08-15T15:05:02Z) - On Sampling-Based Training Criteria for Neural Language Modeling [97.35284042981675]
We consider Monte Carlo sampling, importance sampling, a novel method we call compensated partial summation, and noise contrastive estimation.
We show that all these sampling methods can perform equally well, as long as we correct for the intended class posterior probabilities.
Experimental results in language modeling and automatic speech recognition on Switchboard and LibriSpeech support our claim.
arXiv Detail & Related papers (2021-04-21T12:55:52Z) - A Discussion on Practical Considerations with Sparse Regression
Methodologies [0.0]
Two papers published in Statistical Science study the comparative performance of several sparse regression methodologies.
We summarize and compare the two studies and aim to provide clarity and value to users.
arXiv Detail & Related papers (2020-11-18T15:58:35Z) - Optimal Off-Policy Evaluation from Multiple Logging Policies [77.62012545592233]
We study off-policy evaluation from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling.
We find the OPE estimator for multiple loggers with minimum variance for any instance, i.e., the efficient one.
arXiv Detail & Related papers (2020-10-21T13:43:48Z) - Online Active Model Selection for Pre-trained Classifiers [72.84853880948894]
We design an online selective sampling approach that actively selects informative examples to label and outputs the best model with high probability at any round.
Our algorithm can be used for online prediction tasks for both adversarial and streams.
arXiv Detail & Related papers (2020-10-19T19:53:15Z) - SAMBA: Safe Model-Based & Active Reinforcement Learning [59.01424351231993]
SAMBA is a framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics.
We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations.
We provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.
arXiv Detail & Related papers (2020-06-12T10:40:46Z) - Sparse Methods for Automatic Relevance Determination [0.0]
We first review automatic relevance determination (ARD) and analytically demonstrate the need to additional regularization or thresholding to achieve sparse models.
We then discuss two classes of methods, regularization based and thresholding based, which build on ARD to learn parsimonious solutions to linear problems.
arXiv Detail & Related papers (2020-05-18T14:08:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.