Unifying Clustered and Non-stationary Bandits
- URL: http://arxiv.org/abs/2009.02463v1
- Date: Sat, 5 Sep 2020 04:58:06 GMT
- Title: Unifying Clustered and Non-stationary Bandits
- Authors: Chuanhao Li, Qingyun Wu and Hongning Wang
- Abstract summary: Non-stationary bandits and online clustering of bandits lift the restrictive assumptions in contextual bandits.
We propose test of homogeneity, which seamlessly addresses change detection for non-stationary bandits and cluster identification for online clustering of bandit.
Rigorous regret analysis and extensive empirical evaluations demonstrate the value of our proposed solution.
- Score: 50.12992652938055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-stationary bandits and online clustering of bandits lift the restrictive
assumptions in contextual bandits and provide solutions to many important
real-world scenarios. Though the essence in solving these two problems overlaps
considerably, they have been studied independently. In this paper, we connect
these two strands of bandit research under the notion of test of homogeneity,
which seamlessly addresses change detection for non-stationary bandit and
cluster identification for online clustering of bandit in a unified solution
framework. Rigorous regret analysis and extensive empirical evaluations
demonstrate the value of our proposed solution, especially its flexibility in
handling various environment assumptions.
Related papers
- Finite-Sample and Distribution-Free Fair Classification: Optimal Trade-off Between Excess Risk and Fairness, and the Cost of Group-Blindness [14.421493372559762]
We quantify the impact of enforcing algorithmic fairness and group-blindness in binary classification under group fairness constraints.
We propose a unified framework for fair classification that provides distribution-free and finite-sample fairness guarantees with controlled excess risk.
arXiv Detail & Related papers (2024-10-21T20:04:17Z) - A Unified Framework of Policy Learning for Contextual Bandit with
Confounding Bias and Missing Observations [108.89353070722497]
We study the offline contextual bandit problem, where we aim to acquire an optimal policy using observational data.
We present a new algorithm called Causal-Adjusted Pessimistic (CAP) policy learning, which forms the reward function as the solution of an integral equation system.
arXiv Detail & Related papers (2023-03-20T15:17:31Z) - A Definition of Non-Stationary Bandits [12.643821787548154]
We identify a formal definition of non-stationarity that can consistently distinguish non-stationary bandits from stationary ones.
We show that this definition can ambiguously classify the same bandit as both stationary and non-stationary.
We introduce a formal definition of non-stationary bandits that resolves these issues.
arXiv Detail & Related papers (2023-02-23T17:55:11Z) - On the pitfalls of entropy-based uncertainty for multi-class
semi-supervised segmentation [8.464487190628395]
Semi-supervised learning has emerged as an appealing strategy to train deep models with limited supervision.
We demonstrate in this work that this strategy leads to suboptimal results in a multi-class context.
We propose an alternative solution to compute the uncertainty in a multi-class setting, based on divergence distances and which account for inter-class overlap.
arXiv Detail & Related papers (2022-03-07T18:35:17Z) - Max-Min Grouped Bandits [48.62520520818357]
We introduce a multi-armed bandit problem termed max-min grouped bandits.
The goal is to find a group whose worst arm has the highest mean reward.
This problem is of interest to applications such as recommendation systems.
arXiv Detail & Related papers (2021-11-17T01:59:15Z) - Statistical Consequences of Dueling Bandits [0.0]
Multi-Armed-Bandit frameworks have often been used to assess educational interventions.
Recent work has shown that it is more beneficial for a student to provide qualitative feedback through preference elicitation.
We compare traditional uniform sampling to a dueling bandit algorithm and find that dueling bandit algorithms perform well at cumulative regret minimisation, but lead to inflated Type-I error rates and reduced power under certain circumstances.
arXiv Detail & Related papers (2021-10-16T23:48:43Z) - Consensus-Guided Correspondence Denoising [67.35345850146393]
We propose to denoise correspondences with a local-to-global consensus learning framework to robustly identify correspondence.
A novel "pruning" block is introduced to distill reliable candidates from initial matches according to their consensus scores estimated by dynamic graphs from local to global regions.
Our method outperforms state-of-the-arts on robust line fitting, wide-baseline image matching and image localization benchmarks by noticeable margins.
arXiv Detail & Related papers (2021-01-03T09:10:00Z) - Contextual Bandit with Missing Rewards [27.066965426355257]
We consider a novel variant of the contextual bandit problem where the reward associated with each context-based decision may not always be observed.
This new problem is motivated by certain online settings including clinical trial and ad recommendation applications.
We propose to combine the standard contextual bandit approach with an unsupervised learning mechanism such as clustering.
arXiv Detail & Related papers (2020-07-13T13:29:51Z) - Towards Robust Fine-grained Recognition by Maximal Separation of
Discriminative Features [72.72840552588134]
We identify the proximity of the latent representations of different classes in fine-grained recognition networks as a key factor to the success of adversarial attacks.
We introduce an attention-based regularization mechanism that maximally separates the discriminative latent features of different classes.
arXiv Detail & Related papers (2020-06-10T18:34:45Z) - Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean.
We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.