Generalized Chernoff Sampling for Active Testing, Active Regression and
Structured Bandit Algorithms
- URL: http://arxiv.org/abs/2012.08073v2
- Date: Sat, 27 Feb 2021 09:26:45 GMT
- Title: Generalized Chernoff Sampling for Active Testing, Active Regression and
Structured Bandit Algorithms
- Authors: Subhojyoti Mukherjee, Ardhendu Tripathy, Robert Nowak
- Abstract summary: This paper studies active learning and best-arm identification in structured bandit settings.
We obtain a novel sample bound complexity for Chernoff's original active testing procedure.
- Score: 16.19565714525819
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Active learning and structured stochastic bandit problems are intimately
related to the classical problem of sequential experimental design. This paper
studies active learning and best-arm identification in structured bandit
settings from the viewpoint of active sequential hypothesis testing, a
framework initiated by Chernoff (1959). We obtain a novel sample complexity
bound for Chernoff's original active testing procedure by uncovering
non-asymptotic terms that reduce in significance as the allowed error
probability $\delta \rightarrow 0$. Initially proposed for testing among
finitely many hypotheses, we obtain the analogue of Chernoff sampling for the
case when the hypotheses belong to a compact space. This allows us to directly
apply it to active learning and structured bandit problems, where the unknown
parameter specifying the arm means is often assumed to be an element of
Euclidean space. Empirically, we demonstrate the potential of our proposed
approach for active learning of neural network models and in linear and
non-linear bandit settings, where we observe that our general-purpose approach
compares favorably to state-of-the-art methods.
Related papers
- In-Context Learning for Pure Exploration in Continuous Spaces [26.001092687873125]
In active sequential testing, also termed pure exploration, a learner is tasked with the goal to adaptively acquire information.<n>We introduce C-ICPE-TS, an algorithm that meta-trains deep neural policies to map observation histories to the next continuous query action.<n>At inference time, C-ICPE-TS actively gathers evidence on previously unseen tasks and infers the true hypothesis without parameter updates or explicit hand-crafted information models.
arXiv Detail & Related papers (2026-02-20T04:20:47Z) - Multi-Armed Sampling Problem and the End of Exploration [14.824891788575417]
This paper introduces the framework of multi-armed sampling, as the sampling counterpart to the optimization problem of multi-arm bandits.<n>We propose an algorithm that achieves plausible notions of regret for this framework and establish corresponding lower bounds.<n>Our work sheds light on the need for exploration and the convergence properties of algorithm for entropy-regularized reinforcement learning.
arXiv Detail & Related papers (2025-07-14T20:50:51Z) - Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget [55.938644481736446]
We introduce a novel algorithm for best feasible arm identification that guarantees an exponential decay in the error probability.<n>We validate our algorithm through comprehensive empirical evaluations across various problem instances with different levels of complexity.
arXiv Detail & Related papers (2025-06-03T02:56:26Z) - On the Adversarial Robustness of Benjamini Hochberg [0.06144680854063938]
The Benjamini-Hochberg (BH) procedure is widely used to control the false detection rate (FDR) in multiple testing.
Considering this control could be relied upon in critical safety/security contexts, we investigate its adversarial robustness.
arXiv Detail & Related papers (2025-01-06T21:41:53Z) - Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - Adaptive Experimentation When You Can't Experiment [55.86593195947978]
This paper introduces the emphconfounded pure exploration transductive linear bandit (textttCPET-LB) problem.
Online services can employ a properly randomized encouragement that incentivizes users toward a specific treatment.
arXiv Detail & Related papers (2024-06-15T20:54:48Z) - Neural Active Learning Beyond Bandits [69.99592173038903]
We study both stream-based and pool-based active learning with neural network approximations.
We propose two algorithms based on the newly designed exploitation and exploration neural networks for stream-based and pool-based active learning.
arXiv Detail & Related papers (2024-04-18T21:52:14Z) - Evasive Active Hypothesis Testing with Deep Neuroevolution: The Single- and Multi-Agent Cases [28.516240952627083]
In this paper, we focus on one centralized and one decentralized problem of active hypothesis testing in the presence of an eavesdropper.<n>For the centralized problem, we present a new framework based on deep NeuroEvolution (NE), whereas, for the decentralized problem, we develop a novel NE-based method for solving collaborative multi-agent tasks.<n>To further reduce the computational complexity of the latter scheme, a novel multi-agent joint NE and pruning framework is also designed.
arXiv Detail & Related papers (2024-03-15T08:55:56Z) - Hypothesis Testing for Class-Conditional Noise Using Local Maximum
Likelihood [1.8798171797988192]
In supervised learning, automatically assessing the quality of the labels before any learning takes place remains an open research question.
In this paper we show how similar procedures can be followed when the underlying model is a product of Local Maximum Likelihood Estimation.
This different view allows for wider applicability of the tests by offering users access to a richer model class.
arXiv Detail & Related papers (2023-12-15T22:14:58Z) - Large-scale Pre-trained Models are Surprisingly Strong in Incremental Novel Class Discovery [76.63807209414789]
We challenge the status quo in class-iNCD and propose a learning paradigm where class discovery occurs continuously and truly unsupervisedly.
We propose simple baselines, composed of a frozen PTM backbone and a learnable linear classifier, that are not only simple to implement but also resilient under longer learning scenarios.
arXiv Detail & Related papers (2023-03-28T13:47:16Z) - Active hypothesis testing in unknown environments using recurrent neural
networks and model free reinforcement learning [0.0]
We make no assumptions about the prior probability, the action and observation sets, and the observation generating process.
Our method can be used in any environment even if it has continuous observations or actions, and performs competitively and sometimes better than the Chernoff test.
arXiv Detail & Related papers (2023-03-19T10:32:25Z) - Shortcomings of Top-Down Randomization-Based Sanity Checks for
Evaluations of Deep Neural Network Explanations [67.40641255908443]
We identify limitations of model-randomization-based sanity checks for the purpose of evaluating explanations.
Top-down model randomization preserves scales of forward pass activations with high probability.
arXiv Detail & Related papers (2022-11-22T18:52:38Z) - Hypothesis Transfer in Bandits by Weighted Models [8.759884299087835]
We consider the problem of contextual multi-armed bandits in the setting of hypothesis transfer learning.
We show a re-weighting scheme for which we show a reduction in the regret over the classic Linear UCB when transfer is desired.
We further extend this method to an arbitrary amount of source models, where the algorithm decides which model is preferred at each time step.
arXiv Detail & Related papers (2022-11-14T14:13:02Z) - Provably and Practically Efficient Neural Contextual Bandits [16.0251555430107]
We propose an algorithm with a provably sublinear regret bound that is also efficient in the finite regime.
The non-asymptotic error bounds may be of broader interest as a tool to establish the relation between the smoothness of the activation functions in neural contextual bandits and the smoothness of the kernels in kernel bandits.
arXiv Detail & Related papers (2022-05-31T20:16:55Z) - Unknown Face Presentation Attack Detection via Localised Learning of
Multiple Kernels [15.000818334408802]
The paper studies face spoofing, a.k.a. presentation attack detection (PAD) in the demanding scenarios of unknown types of attack.
We formulate a convex localised multiple kernel learning algorithm by imposing a joint matrix-norm constraint on the collection of local kernel weights.
arXiv Detail & Related papers (2022-04-22T12:43:25Z) - A Unifying Theory of Thompson Sampling for Continuous Risk-Averse
Bandits [91.3755431537592]
This paper unifies the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem.
Using the contraction principle in the theory of large deviations, we prove novel concentration bounds for continuous risk functionals.
We show that a wide class of risk functionals as well as "nice" functions of them satisfy the continuity condition.
arXiv Detail & Related papers (2021-08-25T17:09:01Z) - A Low Rank Promoting Prior for Unsupervised Contrastive Learning [108.91406719395417]
We construct a novel probabilistic graphical model that effectively incorporates the low rank promoting prior into the framework of contrastive learning.
Our hypothesis explicitly requires that all the samples belonging to the same instance class lie on the same subspace with small dimension.
Empirical evidences show that the proposed algorithm clearly surpasses the state-of-the-art approaches on multiple benchmarks.
arXiv Detail & Related papers (2021-08-05T15:58:25Z) - Latent Bandits Revisited [55.88616813182679]
A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state.
We propose general algorithms for this setting, based on both upper confidence bounds (UCBs) and Thompson sampling.
We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions.
arXiv Detail & Related papers (2020-06-15T19:24:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.