Diffusion Models Meet Contextual Bandits with Large Action Spaces
- URL: http://arxiv.org/abs/2402.10028v1
- Date: Thu, 15 Feb 2024 15:48:55 GMT
- Title: Diffusion Models Meet Contextual Bandits with Large Action Spaces
- Authors: Imad Aouali
- Abstract summary: In contextual bandits, the rewards of actions are often correlated and this can be leveraged to explore them efficiently.
In this work, we capture such correlations using pre-trained diffusion models; upon which we design diffusion Thompson sampling (dTS)
- Score: 1.0878040851638
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Efficient exploration is a key challenge in contextual bandits due to the
large size of their action space, where uninformed exploration can result in
computational and statistical inefficiencies. Fortunately, the rewards of
actions are often correlated and this can be leveraged to explore them
efficiently. In this work, we capture such correlations using pre-trained
diffusion models; upon which we design diffusion Thompson sampling (dTS). Both
theoretical and algorithmic foundations are developed for dTS, and empirical
evaluation also shows its favorable performance.
Related papers
- Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update [70.38810219913593]
We study the generalized linear bandit (GLB) problem, a contextual multi-armed bandit framework that extends the classical linear model by incorporating a non-linear link function.<n>GLBs are widely applicable to real-world scenarios, but their non-linear nature introduces significant challenges in achieving both computational and statistical efficiency.<n>We propose a jointly efficient algorithm that attains a nearly optimal regret bound with $mathcalO(1)$ time and space complexities per round.
arXiv Detail & Related papers (2025-07-16T02:24:21Z) - Improved Diffusion-based Generative Model with Better Adversarial Robustness [65.38540020916432]
Diffusion Probabilistic Models (DPMs) have achieved significant success in generative tasks.
During the denoising process, the input data distributions differ between the training and inference stages.
arXiv Detail & Related papers (2025-02-24T12:29:16Z) - Exploratory Diffusion Model for Unsupervised Reinforcement Learning [28.413426177336703]
Unsupervised reinforcement learning (URL) aims to pre-train agents by exploring diverse states or skills in reward-free environments.<n>Existing methods design intrinsic rewards to model the explored data and encourage further exploration.<n>We propose the Exploratory Diffusion Model (ExDM), which leverages the strong expressive ability of diffusion models to fit the explored data.
arXiv Detail & Related papers (2025-02-11T05:48:51Z) - Adaptive Non-uniform Timestep Sampling for Accelerating Diffusion Model Training [6.694752081172194]
As data distributions grow more complex, training diffusion models to convergence becomes increasingly intensive.<n>We introduce a non-uniform timestep sampling method that prioritizes these more critical timesteps.<n>Our method shows robust performance across various datasets, scheduling strategies, and diffusion architectures.
arXiv Detail & Related papers (2024-11-15T07:12:18Z) - Diffusion Attribution Score: Evaluating Training Data Influence in Diffusion Model [22.39558434131574]
Existing data attribution methods for diffusion models typically quantify the contribution of a training sample.
We argue that the direct usage of diffusion loss cannot represent such a contribution accurately due to the calculation of diffusion loss.
We aim to measure the direct comparison between predicted distributions with an attribution score to analyse the training sample importance.
arXiv Detail & Related papers (2024-10-24T10:58:17Z) - Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - Amortized Posterior Sampling with Diffusion Prior Distillation [55.03585818289934]
Amortized Posterior Sampling is a novel variational inference approach for efficient posterior sampling in inverse problems.<n>Our method trains a conditional flow model to minimize the divergence between the variational distribution and the posterior distribution implicitly defined by the diffusion model.<n>Unlike existing methods, our approach is unsupervised, requires no paired training data, and is applicable to both Euclidean and non-Euclidean domains.
arXiv Detail & Related papers (2024-07-25T09:53:12Z) - Bigger is not Always Better: Scaling Properties of Latent Diffusion Models [46.52780730073693]
We study the scaling properties of latent diffusion models (LDMs) with an emphasis on their sampling efficiency.
We conduct an in-depth investigation into how model size influences sampling efficiency across varying sampling steps.
Our findings unveil a surprising trend: when operating under a given inference budget, smaller models frequently outperform their larger equivalents in generating high-quality results.
arXiv Detail & Related papers (2024-04-01T17:59:48Z) - Bayesian Off-Policy Evaluation and Learning for Large Action Spaces [14.203316003782604]
In interactive systems, actions are often correlated, presenting an opportunity for more sample-efficient off-policy evaluation and learning.
We introduce a unified Bayesian framework to capture these correlations through structured and informative priors.
We propose sDM, a generic Bayesian approach for OPE and OPL, grounded in both algorithmic and theoretical foundations.
arXiv Detail & Related papers (2024-02-22T16:09:45Z) - Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation [53.27596811146316]
Diffusion models operate over a sequence of timesteps instead of instantaneous input-output relationships in previous contexts.
We present Diffusion-TracIn that incorporates this temporal dynamics and observe that samples' loss gradient norms are highly dependent on timestep.
We introduce Diffusion-ReTrac as a re-normalized adaptation that enables the retrieval of training samples more targeted to the test sample of interest.
arXiv Detail & Related papers (2024-01-17T07:58:18Z) - Zero-Inflated Bandits [11.60342504007264]
We study zero-inflated bandits, where the reward is modeled using a classic semi-parametric distribution known as the zero-inflated distribution.
We develop algorithms based on the Upper Confidence Bound and Thompson Sampling frameworks for this specific structure.
arXiv Detail & Related papers (2023-12-25T03:13:21Z) - DSCom: A Data-Driven Self-Adaptive Community-Based Framework for
Influence Maximization in Social Networks [3.97535858363999]
We reformulate the problem on the attributed network and leverage the node attributes to estimate the closeness between connected nodes.
Specifically, we propose a machine learning-based framework, named DSCom, to address this problem.
Compared to the previous theoretical works, we carefully designed empirical experiments with parameterized diffusion models based on real-world social networks.
arXiv Detail & Related papers (2023-11-18T14:03:43Z) - Towards Accelerated Model Training via Bayesian Data Selection [45.62338106716745]
We propose a more reasonable data selection principle by examining the data's impact on the model's generalization loss.
Recent work has proposed a more reasonable data selection principle by examining the data's impact on the model's generalization loss.
This work solves these problems by leveraging a lightweight Bayesian treatment and incorporating off-the-shelf zero-shot predictors built on large-scale pre-trained models.
arXiv Detail & Related papers (2023-08-21T07:58:15Z) - An Efficient Membership Inference Attack for the Diffusion Model by
Proximal Initialization [58.88327181933151]
In this paper, we propose an efficient query-based membership inference attack (MIA)
Experimental results indicate that the proposed method can achieve competitive performance with only two queries on both discrete-time and continuous-time diffusion models.
To the best of our knowledge, this work is the first to study the robustness of diffusion models to MIA in the text-to-speech task.
arXiv Detail & Related papers (2023-05-26T16:38:48Z) - Federated Learning for Heterogeneous Bandits with Unobserved Contexts [0.0]
We study the problem of federated multi-arm contextual bandits with unknown contexts.
We propose an elimination-based algorithm and prove the regret bound for linearly parametrized reward functions.
arXiv Detail & Related papers (2023-03-29T22:06:24Z) - An Operational Perspective to Fairness Interventions: Where and How to
Intervene [9.833760837977222]
We present a holistic framework for evaluating and contextualizing fairness interventions.
We demonstrate our framework with a case study on predictive parity.
We find predictive parity is difficult to achieve without using group data.
arXiv Detail & Related papers (2023-02-03T07:04:33Z) - Towards Robust Dataset Learning [90.2590325441068]
We propose a principled, tri-level optimization to formulate the robust dataset learning problem.
Under an abstraction model that characterizes robust vs. non-robust features, the proposed method provably learns a robust dataset.
arXiv Detail & Related papers (2022-11-19T17:06:10Z) - Be Your Own Neighborhood: Detecting Adversarial Example by the
Neighborhood Relations Built on Self-Supervised Learning [64.78972193105443]
This paper presents a novel AE detection framework, named trustworthy for predictions.
performs the detection by distinguishing the AE's abnormal relation with its augmented versions.
An off-the-shelf Self-Supervised Learning (SSL) model is used to extract the representation and predict the label.
arXiv Detail & Related papers (2022-08-31T08:18:44Z) - Contextual Bandits with Large Action Spaces: Made Practical [48.28690486203131]
We present the first efficient, general-purpose algorithm for contextual bandits with continuous, linearly structured action spaces.
Our algorithm makes use of computational oracles for supervised learning, and (ii) optimization over the action space, and achieves sample complexity, runtime, and memory independent of the size of the action space.
arXiv Detail & Related papers (2022-07-12T21:01:48Z) - How Much is Enough? A Study on Diffusion Times in Score-based Generative
Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution.
We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z) - Obtaining Better Static Word Embeddings Using Contextual Embedding
Models [53.86080627007695]
Our proposed distillation method is a simple extension of CBOW-based training.
As a side-effect, our approach also allows a fair comparison of both contextual and static embeddings.
arXiv Detail & Related papers (2021-06-08T12:59:32Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - An Effective Baseline for Robustness to Distributional Shift [5.627346969563955]
Refraining from confidently predicting when faced with categories of inputs different from those seen during training is an important requirement for the safe deployment of deep learning systems.
We present a simple, but highly effective approach to deal with out-of-distribution detection that uses the principle of abstention.
arXiv Detail & Related papers (2021-05-15T00:46:11Z) - Deep Stable Learning for Out-Of-Distribution Generalization [27.437046504902938]
Approaches based on deep neural networks have achieved striking performance when testing data and training data share similar distribution.
Eliminating the impact of distribution shifts between training and testing data is crucial for building performance-promising deep models.
We propose to address this problem by removing the dependencies between features via learning weights for training samples.
arXiv Detail & Related papers (2021-04-16T03:54:21Z) - Generalization Properties of Optimal Transport GANs with Latent
Distribution Learning [52.25145141639159]
We study how the interplay between the latent distribution and the complexity of the pushforward map affects performance.
Motivated by our analysis, we advocate learning the latent distribution as well as the pushforward map within the GAN paradigm.
arXiv Detail & Related papers (2020-07-29T07:31:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.