Related papers: Exponentially Consistent Statistical Classification of Continuous Sequences with Distribution Uncertainty

Exponentially Consistent Statistical Classification of Continuous Sequences with Distribution Uncertainty

URL: http://arxiv.org/abs/2410.21799v1
Date: Tue, 29 Oct 2024 07:06:40 GMT
Title: Exponentially Consistent Statistical Classification of Continuous Sequences with Distribution Uncertainty
Authors: Lina Zhu, Lin Zhou,
Abstract summary: We study multiple classification for continuous sequences with distribution uncertainty. We propose distribution free tests and prove that the error probabilities of our tests decay exponentially fast for three different test designs.
Score: 9.017367466798312
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In multiple classification, one aims to determine whether a testing sequence is generated from the same distribution as one of the M training sequences or not. Unlike most of existing studies that focus on discrete-valued sequences with perfect distribution match, we study multiple classification for continuous sequences with distribution uncertainty, where the generating distributions of the testing and training sequences deviate even under the true hypothesis. In particular, we propose distribution free tests and prove that the error probabilities of our tests decay exponentially fast for three different test designs: fixed-length, sequential, and two-phase tests. We first consider the simple case without the null hypothesis, where the testing sequence is known to be generated from a distribution close to the generating distribution of one of the training sequences. Subsequently, we generalize our results to a more general case with the null hypothesis by allowing the testing sequence to be generated from a distribution that is vastly different from the generating distributions of all training sequences.

Related papers

Multi-Distribution Robust Conformal Prediction [15.5300376981723]
We study the problem of constructing a conformal prediction set that is uniformly valid across multiple, heterogeneous distributions.<n>We first propose a max-p aggregation scheme that delivers finite-sample, multi-distribution coverage.<n>We discuss how our framework relates to group-wise distributionally robust optimization, sub-population shift, fairness, and multi-source learning.
arXiv Detail & Related papers (2026-01-06T13:22:13Z)
Limitations of Using Identical Distributions for Training and Testing When Learning Boolean Functions [1.3537117504260623]
We study whether it is always optimal for the training distribution to be identical to the test distribution when the learner is allowed to be optimally adapted to the training distribution.<n>We also show that when certain regularities are imposed on the target functions, the standard conclusion is recovered in the case of the uniform distribution.
arXiv Detail & Related papers (2025-11-30T09:06:07Z)
Replicable Distribution Testing [38.76577965182225]
Given independent samples, the goal is to characterize the sample complexity of replicably testing natural properties of the underlying distributions.<n>On the algorithmic front, we develop new algorithms for testing closeness and independence of discrete distributions.<n>On the lower bound front, we develop a new methodology for proving sample complexity lower bounds for replicable testing.
arXiv Detail & Related papers (2025-07-03T17:27:11Z)
Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers [49.97755400231656]
We present the first performance guarantee with explicit dimensional general score-mismatched diffusion samplers. We show that score mismatches result in an distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions. This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise.
arXiv Detail & Related papers (2024-10-17T16:42:12Z)
Transductive conformal inference with adaptive scores [3.591224588041813]
We consider the transductive setting, where decisions are made on a test sample of $m$ new points. We show that their joint distribution follows a P'olya urn model, and establish a concentration inequality for their empirical distribution function. We demonstrate the usefulness of these theoretical results through uniform, in-probability guarantees for two machine learning tasks.
arXiv Detail & Related papers (2023-10-27T12:48:30Z)
Distribution Shift Inversion for Out-of-Distribution Prediction [57.22301285120695]
We propose a portable Distribution Shift Inversion algorithm for Out-of-Distribution (OoD) prediction. We show that our method provides a general performance gain when plugged into a wide range of commonly used OoD algorithms.
arXiv Detail & Related papers (2023-06-14T08:00:49Z)
Sequential Predictive Two-Sample and Independence Testing [114.4130718687858]
We study the problems of sequential nonparametric two-sample and independence testing. We build upon the principle of (nonparametric) testing by betting.
arXiv Detail & Related papers (2023-04-29T01:30:33Z)
Near-Optimal Non-Parametric Sequential Tests and Confidence Sequences with Possibly Dependent Observations [44.71254888821376]
We provide the first type-I-error and expected-rejection-time guarantees under general non-data generating processes. We show how to apply our results to inference on parameters defined by estimating equations, such as average treatment effects.
arXiv Detail & Related papers (2022-12-29T18:37:08Z)
Robust Calibration with Multi-domain Temperature Scaling [86.07299013396059]
We develop a systematic calibration model to handle distribution shifts by leveraging data from multiple domains. Our proposed method -- multi-domain temperature scaling -- uses the robustness in the domains to improve calibration under distribution shift.
arXiv Detail & Related papers (2022-06-06T17:32:12Z)
Kernel Robust Hypothesis Testing [20.78285964841612]
In this paper, uncertainty sets are constructed in a data-driven manner using kernel method. The goal is to design a test that performs well under the worst-case distributions over the uncertainty sets. For the Neyman-Pearson setting, the goal is to minimize the worst-case probability of miss detection subject to a constraint on the worst-case probability of false alarm.
arXiv Detail & Related papers (2022-03-23T23:59:03Z)
As Easy as ABC: Adaptive Binning Coincidence Test for Uniformity Testing [13.028716493611789]
We propose a sequential test that adapts to the unknown distribution under the alternative hypothesis. We establish the sample complexity of the proposed tests as well as a lower bound.
arXiv Detail & Related papers (2021-10-12T20:19:57Z)
Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision [85.07855130048951]
We study a more practical task setting, called test-agnostic long-tailed recognition, where the training class distribution is long-tailed. We propose a new method, called Test-time Aggregating Diverse Experts (TADE), that trains diverse experts to excel at handling different test distributions. We theoretically show that our method has provable ability to simulate unknown test class distributions.
arXiv Detail & Related papers (2021-07-20T04:10:31Z)
Distributional Reinforcement Learning via Moment Matching [54.16108052278444]
We formulate a method that learns a finite set of statistics from each return distribution via neural networks. Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target. Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines.
arXiv Detail & Related papers (2020-07-24T05:18:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.