Significance tests of feature relevance for a blackbox learner
- URL: http://arxiv.org/abs/2103.04985v1
- Date: Tue, 2 Mar 2021 00:59:19 GMT
- Title: Significance tests of feature relevance for a blackbox learner
- Authors: Ben Dai, Xiaotong Shen, Wei Pan
- Abstract summary: We derive two consistent tests for the feature relevance of a blackbox learner.
The first evaluates a loss difference with perturbation on an inference sample.
The second splits the inference sample into two but does not require data perturbation.
- Score: 6.72450543613463
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: An exciting recent development is the uptake of deep learning in many
scientific fields, where the objective is seeking novel scientific insights and
discoveries. To interpret a learning outcome, researchers perform hypothesis
testing for explainable features to advance scientific domain knowledge. In
such a situation, testing for a blackbox learner poses a severe challenge
because of intractable models, unknown limiting distributions of parameter
estimates, and high computational constraints. In this article, we derive two
consistent tests for the feature relevance of a blackbox learner. The first one
evaluates a loss difference with perturbation on an inference sample, which is
independent of an estimation sample used for parameter estimation in model
fitting. The second further splits the inference sample into two but does not
require data perturbation. Also, we develop their combined versions by
aggregating the order statistics of the $p$-values based on repeated sample
splitting. To estimate the splitting ratio and the perturbation size, we
develop adaptive splitting schemes for suitably controlling the Type \rom{1}
error subject to computational constraints. By deflating the
\textit{bias-sd-ratio}, we establish asymptotic null distributions of the test
statistics and their consistency in terms of statistical power. Our theoretical
power analysis and simulations indicate that the one-split test is more
powerful than the two-split test, though the latter is easier to apply for
large datasets. Moreover, the combined tests are more stable while compensating
for a power loss by repeated sample splitting. Numerically, we demonstrate the
utility of the proposed tests on two benchmark examples. Accompanying this
paper is our Python library {\tt dnn-inference}
https://dnn-inference.readthedocs.io/en/latest/ that implements the proposed
tests.
Related papers
- Precise Error Rates for Computationally Efficient Testing [75.63895690909241]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity.
An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z) - Deep anytime-valid hypothesis testing [29.273915933729057]
We propose a general framework for constructing powerful, sequential hypothesis tests for nonparametric testing problems.
We develop a principled approach of leveraging the representation capability of machine learning models within the testing-by-betting framework.
Empirical results on synthetic and real-world datasets demonstrate that tests instantiated using our general framework are competitive against specialized baselines.
arXiv Detail & Related papers (2023-10-30T09:46:19Z) - Active Sequential Two-Sample Testing [18.99517340397671]
We consider the two-sample testing problem in a new scenario where sample measurements are inexpensive to access.
We devise the first emphactiveNIST-sample testing framework that not only sequentially but also emphactively queries.
In practice, we introduce an instantiation of our framework and evaluate it using several experiments.
arXiv Detail & Related papers (2023-01-30T02:23:49Z) - Model-Free Sequential Testing for Conditional Independence via Testing
by Betting [8.293345261434943]
The proposed test allows researchers to analyze an incoming i.i.d. data stream with any arbitrary dependency structure.
We allow the processing of data points online as soon as they arrive and stop data acquisition once significant results are detected.
arXiv Detail & Related papers (2022-10-01T20:05:33Z) - Learning to Increase the Power of Conditional Randomization Tests [8.883733362171032]
The model-X conditional randomization test is a generic framework for conditional independence testing.
We introduce novel model-fitting schemes that are designed to explicitly improve the power of model-X tests.
arXiv Detail & Related papers (2022-07-03T12:29:25Z) - Nonparametric Conditional Local Independence Testing [69.31200003384122]
Conditional local independence is an independence relation among continuous time processes.
No nonparametric test of conditional local independence has been available.
We propose such a nonparametric test based on double machine learning.
arXiv Detail & Related papers (2022-03-25T10:31:02Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Two-Sample Testing on Ranked Preference Data and the Role of Modeling
Assumptions [57.77347280992548]
In this paper, we design two-sample tests for pairwise comparison data and ranking data.
Our test requires essentially no assumptions on the distributions.
By applying our two-sample test on real-world pairwise comparison data, we conclude that ratings and rankings provided by people are indeed distributed differently.
arXiv Detail & Related papers (2020-06-21T20:51:09Z) - Double Generative Adversarial Networks for Conditional Independence
Testing [8.359770027722275]
High-dimensional conditional independence testing is a key building block in statistics and machine learning.
We propose an inferential procedure based on double generative adversarial networks (GANs)
arXiv Detail & Related papers (2020-06-03T16:14:15Z) - An Investigation of Why Overparameterization Exacerbates Spurious
Correlations [98.3066727301239]
We identify two key properties of the training data that drive this behavior.
We show how the inductive bias of models towards "memorizing" fewer examples can cause over parameterization to hurt.
arXiv Detail & Related papers (2020-05-09T01:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.