Related papers: Meta Clustering of Neural Bandits

Meta Clustering of Neural Bandits

URL: http://arxiv.org/abs/2408.05586v2
Date: Fri, 27 Sep 2024 03:38:36 GMT
Title: Meta Clustering of Neural Bandits
Authors: Yikun Ban, Yunzhe Qi, Tianxin Wei, Lihui Liu, Jingrui He,
Abstract summary: We study a new problem, Clustering of Neural Bandits, by extending previous work to the arbitrary reward function. We propose a novel algorithm called M-CNB, which utilizes a meta-learner to represent and rapidly adapt to dynamic clusters. In extensive experiments conducted in both recommendation and online classification scenarios, M-CNB outperforms SOTA baselines.
Score: 45.77505279698894
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The contextual bandit has been identified as a powerful framework to formulate the recommendation process as a sequential decision-making process, where each item is regarded as an arm and the objective is to minimize the regret of $T$ rounds. In this paper, we study a new problem, Clustering of Neural Bandits, by extending previous work to the arbitrary reward function, to strike a balance between user heterogeneity and user correlations in the recommender system. To solve this problem, we propose a novel algorithm called M-CNB, which utilizes a meta-learner to represent and rapidly adapt to dynamic clusters, along with an informative Upper Confidence Bound (UCB)-based exploration strategy. We provide an instance-dependent performance guarantee for the proposed algorithm that withstands the adversarial context, and we further prove the guarantee is at least as good as state-of-the-art (SOTA) approaches under the same assumptions. In extensive experiments conducted in both recommendation and online classification scenarios, M-CNB outperforms SOTA baselines. This shows the effectiveness of the proposed approach in improving online recommendation and online classification performance.

Related papers

Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits [15.700062892888084]
We introduce a novel probing framework that strategically gathers information about selected arms before allocation.<n>In the offline setting, where reward distributions are known, we leverage submodular properties to design a greedy probing algorithm with a provable performance bound.<n>For the more complex online setting, we develop an algorithm that achieves sublinear regret while maintaining fairness.
arXiv Detail & Related papers (2025-06-17T21:43:21Z)
Online Clustering of Dueling Bandits [59.09590979404303]
We introduce the first "clustering of dueling bandit algorithms" to enable collaborative decision-making based on preference feedback. We propose two novel algorithms: (1) Clustering of Linear Dueling Bandits (COLDB) which models the user reward functions as linear functions of the context vectors, and (2) Clustering of Neural Dueling Bandits (CONDB) which uses a neural network to model complex, non-linear user reward functions.
arXiv Detail & Related papers (2025-02-04T07:55:41Z)
Demystifying Online Clustering of Bandits: Enhanced Exploration Under Stochastic and Smoothed Adversarial Contexts [27.62165569135504]
A line of research, known as online clustering of bandits, extends contextual MAB by grouping similar users into clusters. Existing algorithms, which rely on the upper confidence bound (UCB) strategy, struggle to gather adequate statistical information to accurately identify unknown user clusters. We propose two novel algorithms, UniCLUB and PhaseUniCLUB, which incorporate enhanced exploration mechanisms to accelerate cluster identification.
arXiv Detail & Related papers (2025-01-01T16:38:29Z)
Improving Portfolio Optimization Results with Bandit Networks [0.0]
We introduce and evaluate novel Bandit algorithms designed for non-stationary environments. First, we present the Adaptive Discounted Thompson Sampling (ADTS) algorithm. We then extend this approach to the Portfolio Optimization problem by introducing the Combinatorial Adaptive Discounted Thompson Sampling (CADTS) algorithm.
arXiv Detail & Related papers (2024-10-05T16:17:31Z)
Unfolding ADMM for Enhanced Subspace Clustering of Hyperspectral Images [43.152314090830174]
We introduce an innovative clustering architecture for hyperspectral images (HSI) by unfolding an iterative solver based on the Alternating Direction Method of Multipliers (ADMM) for sparse subspace clustering. Our approach captures well the structural characteristics of HSI data by employing the K nearest neighbors algorithm as part of a structure preservation module.
arXiv Detail & Related papers (2024-04-10T15:51:46Z)
Contextual Restless Multi-Armed Bandits with Application to Demand Response Decision-Making [10.054978663965533]
This paper introduces a novel multi-armed bandits framework, Contextual Restless Bandits (CRB) for complex online decision-making. CRB incorporates the core features of contextual bandits and restless bandits, so that it can model both the internal state transitions of each arm and the influence of external global environmental contexts.
arXiv Detail & Related papers (2024-03-22T22:35:07Z)
Graph Neural Bandits [49.85090929163639]
We propose a framework named Graph Neural Bandits (GNB) to leverage the collaborative nature among users empowered by graph neural networks (GNNs) To refine the recommendation strategy, we utilize separate GNN-based models on estimated user graphs for exploitation and adaptive exploration.
arXiv Detail & Related papers (2023-08-21T15:57:57Z)
Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting. We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z)
Rethinking Clustering-Based Pseudo-Labeling for Unsupervised Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling. This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data. We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z)
Federated Online Clustering of Bandits [35.21933787486559]
Contextual multi-armed bandit (MAB) is an important sequential decision-making problem in recommendation systems. We study the federated online clustering of bandit (FCLUB) problem, which aims to minimize the total regret while satisfying privacy and communication considerations.
arXiv Detail & Related papers (2022-08-31T13:46:02Z)
Local Clustering in Contextual Multi-Armed Bandits [44.11480686973274]
We study identifying user clusters in contextual multi-armed bandits (MAB) We propose a bandit algorithm, LOCB, embedded with local clustering procedure. We evaluate the proposed algorithm from various aspects, which outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2021-02-26T21:59:29Z)
CRACT: Cascaded Regression-Align-Classification for Robust Visual Tracking [97.84109669027225]
We introduce an improved proposal refinement module, Cascaded Regression-Align- Classification (CRAC) CRAC yields new state-of-the-art performances on many benchmarks. In experiments on seven benchmarks including OTB-2015, UAV123, NfS, VOT-2018, TrackingNet, GOT-10k and LaSOT, our CRACT exhibits very promising results in comparison with state-of-the-art competitors.
arXiv Detail & Related papers (2020-11-25T02:18:33Z)
Selective Classification via One-Sided Prediction [54.05407231648068]
One-sided prediction (OSP) based relaxation yields an SC scheme that attains near-optimal coverage in the practically relevant high target accuracy regime. We theoretically derive bounds generalization for SC and OSP, and empirically we show that our scheme strongly outperforms state of the art methods in coverage at small error levels.
arXiv Detail & Related papers (2020-10-15T16:14:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.