Empirical Bayesian Multi-Bandit Learning
- URL: http://arxiv.org/abs/2510.26284v2
- Date: Thu, 06 Nov 2025 00:56:55 GMT
- Title: Empirical Bayesian Multi-Bandit Learning
- Authors: Xia Jiang, Rong J. B. Zhu,
- Abstract summary: Multi-task learning in contextual bandits has attracted significant research interest.<n>We propose a novel hierarchical Bayesian framework for learning in various bandit instances.<n>We show that our algorithms achieve lower cumulative regret compared to existing techniques.
- Score: 8.980876474818153
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-task learning in contextual bandits has attracted significant research interest due to its potential to enhance decision-making across multiple related tasks by leveraging shared structures and task-specific heterogeneity. In this article, we propose a novel hierarchical Bayesian framework for learning in various bandit instances. This framework captures both the heterogeneity and the correlations among different bandit instances through a hierarchical Bayesian model, enabling effective information sharing while accommodating instance-specific variations. Unlike previous methods that overlook the learning of the covariance structure across bandits, we introduce an empirical Bayesian approach to estimate the covariance matrix of the prior distribution. This enhances both the practicality and flexibility of learning across multi-bandits. Building on this approach, we develop two efficient algorithms: ebmTS (Empirical Bayesian Multi-Bandit Thompson Sampling) and ebmUCB (Empirical Bayesian Multi-Bandit Upper Confidence Bound), both of which incorporate the estimated prior into the decision-making process. We provide the frequentist regret upper bounds for the proposed algorithms, thereby filling a research gap in the field of multi-bandit problems. Extensive experiments on both synthetic and real-world datasets demonstrate the superior performance of our algorithms, particularly in complex environments. Our methods achieve lower cumulative regret compared to existing techniques, highlighting their effectiveness in balancing exploration and exploitation across multi-bandits.
Related papers
- BayesRAG: Probabilistic Mutual Evidence Corroboration for Multimodal Retrieval-Augmented Generation [33.53566598271416]
BayesRAG is a novel multimodal retrieval framework grounded in Bayesian inference and Dempster-Shafer evidence theory.<n>We show that BayesRAG significantly outperforms state-of-the-art (SOTA) methods on challenging multimodal benchmarks.
arXiv Detail & Related papers (2026-01-12T08:53:14Z) - Multi-Armed Sampling Problem and the End of Exploration [14.824891788575417]
This paper introduces the framework of multi-armed sampling, as the sampling counterpart to the optimization problem of multi-arm bandits.<n>We propose an algorithm that achieves plausible notions of regret for this framework and establish corresponding lower bounds.<n>Our work sheds light on the need for exploration and the convergence properties of algorithm for entropy-regularized reinforcement learning.
arXiv Detail & Related papers (2025-07-14T20:50:51Z) - Context Attribution with Multi-Armed Bandit Optimization [11.715006981206844]
We propose a novel framework that formulates context attribution as a multi-armed bandit (CMAB) problem.<n>We employ Combinatorial Thompson Sampling (CTS) to efficiently explore the exponentially large space of context subsets under a limited query budget.<n>Our method defines a reward function based on normalized token likelihoods, capturing how well a subset of segments supports the original model response.
arXiv Detail & Related papers (2025-06-24T19:47:27Z) - Demystifying Online Clustering of Bandits: Enhanced Exploration Under Stochastic and Smoothed Adversarial Contexts [27.62165569135504]
A line of research, known as online clustering of bandits, extends contextual MAB by grouping similar users into clusters.<n>Existing algorithms, which rely on the upper confidence bound (UCB) strategy, struggle to gather adequate statistical information to accurately identify unknown user clusters.<n>We propose two novel algorithms, UniCLUB and PhaseUniCLUB, which incorporate enhanced exploration mechanisms to accelerate cluster identification.
arXiv Detail & Related papers (2025-01-01T16:38:29Z) - Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching [53.05954114863596]
We propose a brand-new Deep Boosting Learning (DBL) algorithm for image-text matching.
An anchor branch is first trained to provide insights into the data properties.
A target branch is concurrently tasked with more adaptive margin constraints to further enlarge the relative distance between matched and unmatched samples.
arXiv Detail & Related papers (2024-04-28T08:44:28Z) - Risk-Sensitive Soft Actor-Critic for Robust Deep Reinforcement Learning
under Distribution Shifts [11.765000124617186]
We study the robustness of deep reinforcement learning algorithms against distribution shifts within contextual multi-stage optimization problems.
We show that our algorithm is superior to risk-neutral Soft Actor-Critic as well as to two benchmark approaches for robust deep reinforcement learning.
arXiv Detail & Related papers (2024-02-15T14:55:38Z) - A Fast Algorithm for the Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit [55.2480439325792]
We study the real-valued pure exploration problem in the multi-armed bandit (R-CPE-MAB)<n>We introduce an algorithm named the gap-based exploration (CombGapE) algorithm, whose sample complexity upper bound matches the lower bound up to a problem-dependent constant factor.<n>We numerically show that the CombGapE algorithm outperforms existing methods significantly in both synthetic and real-world datasets.
arXiv Detail & Related papers (2023-06-15T15:37:31Z) - Synergies between Disentanglement and Sparsity: Generalization and
Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations.
Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - A Unified Framework for Multi-distribution Density Ratio Estimation [101.67420298343512]
Binary density ratio estimation (DRE) provides the foundation for many state-of-the-art machine learning algorithms.
We develop a general framework from the perspective of Bregman minimization divergence.
We show that our framework leads to methods that strictly generalize their counterparts in binary DRE.
arXiv Detail & Related papers (2021-12-07T01:23:20Z) - Instance-Dependent Complexity of Contextual Bandits and Reinforcement
Learning: A Disagreement-Based Perspective [104.67295710363679]
In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm.
We introduce a family of complexity measures that are both sufficient and necessary to obtain instance-dependent regret bounds.
We then introduce new oracle-efficient algorithms which adapt to the gap whenever possible, while also attaining the minimax rate in the worst case.
arXiv Detail & Related papers (2020-10-07T01:33:06Z) - Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in
Cooperative Tasks [11.480994804659908]
Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used evaluation tasks and criteria.
We provide a systematic evaluation and comparison of three different classes of MARL algorithms.
Our experiments serve as a reference for the expected performance of algorithms across different learning tasks.
arXiv Detail & Related papers (2020-06-14T11:22:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.