Diffusion Models Meet Contextual Bandits with Large Action Spaces
- URL: http://arxiv.org/abs/2402.10028v1
- Date: Thu, 15 Feb 2024 15:48:55 GMT
- Title: Diffusion Models Meet Contextual Bandits with Large Action Spaces
- Authors: Imad Aouali
- Abstract summary: In contextual bandits, the rewards of actions are often correlated and this can be leveraged to explore them efficiently.
In this work, we capture such correlations using pre-trained diffusion models; upon which we design diffusion Thompson sampling (dTS)
- Score: 1.0878040851638
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Efficient exploration is a key challenge in contextual bandits due to the
large size of their action space, where uninformed exploration can result in
computational and statistical inefficiencies. Fortunately, the rewards of
actions are often correlated and this can be leveraged to explore them
efficiently. In this work, we capture such correlations using pre-trained
diffusion models; upon which we design diffusion Thompson sampling (dTS). Both
theoretical and algorithmic foundations are developed for dTS, and empirical
evaluation also shows its favorable performance.
Related papers
- Diffusion Attribution Score: Evaluating Training Data Influence in Diffusion Model [22.39558434131574]
Existing data attribution methods for diffusion models typically quantify the contribution of a training sample.
We argue that the direct usage of diffusion loss cannot represent such a contribution accurately due to the calculation of diffusion loss.
We aim to measure the direct comparison between predicted distributions with an attribution score to analyse the training sample importance.
arXiv Detail & Related papers (2024-10-24T10:58:17Z) - Bigger is not Always Better: Scaling Properties of Latent Diffusion Models [46.52780730073693]
We study the scaling properties of latent diffusion models (LDMs) with an emphasis on their sampling efficiency.
We conduct an in-depth investigation into how model size influences sampling efficiency across varying sampling steps.
Our findings unveil a surprising trend: when operating under a given inference budget, smaller models frequently outperform their larger equivalents in generating high-quality results.
arXiv Detail & Related papers (2024-04-01T17:59:48Z) - Bayesian Off-Policy Evaluation and Learning for Large Action Spaces [14.203316003782604]
In interactive systems, actions are often correlated, presenting an opportunity for more sample-efficient off-policy evaluation and learning.
We introduce a unified Bayesian framework to capture these correlations through structured and informative priors.
We propose sDM, a generic Bayesian approach for OPE and OPL, grounded in both algorithmic and theoretical foundations.
arXiv Detail & Related papers (2024-02-22T16:09:45Z) - Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation [53.27596811146316]
Diffusion models operate over a sequence of timesteps instead of instantaneous input-output relationships in previous contexts.
We present Diffusion-TracIn that incorporates this temporal dynamics and observe that samples' loss gradient norms are highly dependent on timestep.
We introduce Diffusion-ReTrac as a re-normalized adaptation that enables the retrieval of training samples more targeted to the test sample of interest.
arXiv Detail & Related papers (2024-01-17T07:58:18Z) - Zero-Inflated Bandits [11.60342504007264]
We study zero-inflated bandits, where the reward is modeled using a classic semi-parametric distribution known as the zero-inflated distribution.
We develop algorithms based on the Upper Confidence Bound and Thompson Sampling frameworks for this specific structure.
arXiv Detail & Related papers (2023-12-25T03:13:21Z) - An Efficient Membership Inference Attack for the Diffusion Model by
Proximal Initialization [58.88327181933151]
In this paper, we propose an efficient query-based membership inference attack (MIA)
Experimental results indicate that the proposed method can achieve competitive performance with only two queries on both discrete-time and continuous-time diffusion models.
To the best of our knowledge, this work is the first to study the robustness of diffusion models to MIA in the text-to-speech task.
arXiv Detail & Related papers (2023-05-26T16:38:48Z) - Be Your Own Neighborhood: Detecting Adversarial Example by the
Neighborhood Relations Built on Self-Supervised Learning [64.78972193105443]
This paper presents a novel AE detection framework, named trustworthy for predictions.
performs the detection by distinguishing the AE's abnormal relation with its augmented versions.
An off-the-shelf Self-Supervised Learning (SSL) model is used to extract the representation and predict the label.
arXiv Detail & Related papers (2022-08-31T08:18:44Z) - How Much is Enough? A Study on Diffusion Times in Score-based Generative
Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution.
We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - Deep Stable Learning for Out-Of-Distribution Generalization [27.437046504902938]
Approaches based on deep neural networks have achieved striking performance when testing data and training data share similar distribution.
Eliminating the impact of distribution shifts between training and testing data is crucial for building performance-promising deep models.
We propose to address this problem by removing the dependencies between features via learning weights for training samples.
arXiv Detail & Related papers (2021-04-16T03:54:21Z) - Generalization Properties of Optimal Transport GANs with Latent
Distribution Learning [52.25145141639159]
We study how the interplay between the latent distribution and the complexity of the pushforward map affects performance.
Motivated by our analysis, we advocate learning the latent distribution as well as the pushforward map within the GAN paradigm.
arXiv Detail & Related papers (2020-07-29T07:31:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.