Related papers: AutoML Two-Sample Test

AutoML Two-Sample Test

URL: http://arxiv.org/abs/2206.08843v1
Date: Fri, 17 Jun 2022 15:41:07 GMT
Title: AutoML Two-Sample Test
Authors: Jonas M. K\"ubler, Vincent Stimper, Simon Buchholz, Krikamol Muandet, Bernhard Sch\"olkopf
Abstract summary: We use a simple test that takes the mean discrepancy of a witness function as the test statistic and prove that minimizing a squared loss leads to a witness with optimal testing power. We provide an implementation of the AutoML two-sample test in the Python package autotst.
Score: 13.468660785510945
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Two-sample tests are important in statistics and machine learning, both as tools for scientific discovery as well as to detect distribution shifts. This led to the development of many sophisticated test procedures going beyond the standard supervised learning frameworks, whose usage can require specialized knowledge about two-sample testing. We use a simple test that takes the mean discrepancy of a witness function as the test statistic and prove that minimizing a squared loss leads to a witness with optimal testing power. This allows us to leverage recent advancements in AutoML. Without any user input about the problems at hand, and using the same method for all our experiments, our AutoML two-sample test achieves competitive performance on a diverse distribution shift benchmark as well as on challenging two-sample testing problems. We provide an implementation of the AutoML two-sample test in the Python package autotst.

Related papers

Advanced Tutorial: Label-Efficient Two-Sample Tests [15.574402626262053]
This tutorial explores two-sample testing in a context where an analyst has many features from two samples. In machine learning, a similar scenario is studied in active learning. This tutorial extends active learning concepts to two-sample testing within this textitlabel-costly setting.
arXiv Detail & Related papers (2025-01-07T06:43:18Z)
Revisit Non-parametric Two-sample Testing as a Semi-supervised Learning Problem [37.55998723110691]
We introduce the SSL-based TwoSample Test (SSL-C2ST) framework for non-supervised two-sample testing. Extensive experiments and theoretical analysis demonstrate that SSL-C2ST outperforms traditional C2ST by effectively leveraging unlabeled data.
arXiv Detail & Related papers (2024-11-30T23:23:52Z)
Context-Aware Testing: A New Paradigm for Model Testing with Large Language Models [49.06068319380296]
We introduce context-aware testing (CAT) which uses context as an inductive bias to guide the search for meaningful model failures. We instantiate the first CAT system, SMART Testing, which employs large language models to hypothesize relevant and likely failures.
arXiv Detail & Related papers (2024-10-31T15:06:16Z)
Model Equality Testing: Which Model Is This API Serving? [59.005869726179455]
We formalize detecting such distortions as Model Equality Testing, a two-sample testing problem. A test built on a simple string kernel achieves a median of 77.4% power against a range of distortions. We then apply this test to commercial inference APIs for four Llama models, finding that 11 out of 31 endpoints serve different distributions than reference weights released by Meta.
arXiv Detail & Related papers (2024-10-26T18:34:53Z)
STAMP: Outlier-Aware Test-Time Adaptation with Stable Memory Replay [76.06127233986663]
Test-time adaptation (TTA) aims to address the distribution shift between the training and test data with only unlabeled data at test time. This paper pays attention to the problem that conducts both sample recognition and outlier rejection during inference while outliers exist. We propose a new approach called STAble Memory rePlay (STAMP), which performs optimization over a stable memory bank instead of the risky mini-batch.
arXiv Detail & Related papers (2024-07-22T16:25:41Z)
Deep anytime-valid hypothesis testing [29.273915933729057]
We propose a general framework for constructing powerful, sequential hypothesis tests for nonparametric testing problems. We develop a principled approach of leveraging the representation capability of machine learning models within the testing-by-betting framework. Empirical results on synthetic and real-world datasets demonstrate that tests instantiated using our general framework are competitive against specialized baselines.
arXiv Detail & Related papers (2023-10-30T09:46:19Z)
Active Sequential Two-Sample Testing [18.99517340397671]
We consider the two-sample testing problem in a new scenario where sample measurements are inexpensive to access. We devise the first emphactiveNIST-sample testing framework that not only sequentially but also emphactively queries. In practice, we introduce an instantiation of our framework and evaluate it using several experiments.
arXiv Detail & Related papers (2023-01-30T02:23:49Z)
E-Valuating Classifier Two-Sample Tests [11.248868528186332]
Our test combines ideas from existing work-valid split likelihood ratio tests and predictive independence tests. The resulting E-values are suitable for anytime sequential two-sample tests.
arXiv Detail & Related papers (2022-10-24T08:18:36Z)
Model-Free Sequential Testing for Conditional Independence via Testing by Betting [8.293345261434943]
The proposed test allows researchers to analyze an incoming i.i.d. data stream with any arbitrary dependency structure. We allow the processing of data points online as soon as they arrive and stop data acquisition once significant results are detected.
arXiv Detail & Related papers (2022-10-01T20:05:33Z)
Test-Time Training with Masked Autoencoders [54.983147122777574]
Test-time training adapts to a new test distribution on the fly by optimizing a model for each test input using self-supervision. In this paper, we use masked autoencoders for this one-sample learning problem.
arXiv Detail & Related papers (2022-09-15T17:59:34Z)
TTAPS: Test-Time Adaption by Aligning Prototypes using Self-Supervision [70.05605071885914]
We propose a novel modification of the self-supervised training algorithm SwAV that adds the ability to adapt to single test samples. We show the success of our method on the common benchmark dataset CIFAR10-C.
arXiv Detail & Related papers (2022-05-18T05:43:06Z)
Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision [85.07855130048951]
We study a more practical task setting, called test-agnostic long-tailed recognition, where the training class distribution is long-tailed. We propose a new method, called Test-time Aggregating Diverse Experts (TADE), that trains diverse experts to excel at handling different test distributions. We theoretically show that our method has provable ability to simulate unknown test class distributions.
arXiv Detail & Related papers (2021-07-20T04:10:31Z)
TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks [14.547623982073475]
Deep learning systems are notoriously difficult to test and debug. It is essential to conduct test selection and label only those selected "high quality" bug-revealing test inputs for test cost reduction. We propose a novel test prioritization technique that brings order into the unlabeled test instances according to their bug-revealing capabilities, namely TestRank.
arXiv Detail & Related papers (2021-05-21T03:41:10Z)
Two-Sample Testing on Ranked Preference Data and the Role of Modeling Assumptions [57.77347280992548]
In this paper, we design two-sample tests for pairwise comparison data and ranking data. Our test requires essentially no assumptions on the distributions. By applying our two-sample test on real-world pairwise comparison data, we conclude that ratings and rankings provided by people are indeed distributed differently.
arXiv Detail & Related papers (2020-06-21T20:51:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.