Independence Tests Without Ground Truth for Noisy Learners
- URL: http://arxiv.org/abs/2010.15662v1
- Date: Wed, 28 Oct 2020 13:03:26 GMT
- Title: Independence Tests Without Ground Truth for Noisy Learners
- Authors: Andr\'es Corrada-Emmanuel, Edward Pantridge, Eddie Zahrebelski, Aditya
Chaganti, Simeon Simeonov
- Abstract summary: We discuss the exact solution for independent binary classifiers.
Its practical applicability is hampered by its sole remaining assumption.
A similar conjecture for the ground truth invariant system for scalar regressors is solvable.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Exact ground truth invariant polynomial systems can be written for
arbitrarily correlated binary classifiers. Their solutions give estimates for
sample statistics that require knowledge of the ground truth of the correct
labels in the sample. Of these polynomial systems, only a few have been solved
in closed form. Here we discuss the exact solution for independent binary
classifiers - resolving an outstanding problem that has been presented at this
conference and others. Its practical applicability is hampered by its sole
remaining assumption - the classifiers need to be independent in their sample
errors. We discuss how to use the closed form solution to create a
self-consistent test that can validate the independence assumption itself
absent the correct labels ground truth. It can be cast as an algebraic geometry
conjecture for binary classifiers that remains unsolved. A similar conjecture
for the ground truth invariant algebraic system for scalar regressors is
solvable, and we present the solution here. We also discuss experiments on the
Penn ML Benchmark classification tasks that provide further evidence that the
conjecture may be true for the polynomial system of binary classifiers.
Related papers
- Towards Self-Supervised Covariance Estimation in Deep Heteroscedastic Regression [102.24287051757469]
We study self-supervised covariance estimation in deep heteroscedastic regression.
We derive an upper bound on the 2-Wasserstein distance between normal distributions.
Experiments over a wide range of synthetic and real datasets demonstrate that the proposed 2-Wasserstein bound coupled with pseudo label annotations results in a computationally cheaper yet accurate deep heteroscedastic regression.
arXiv Detail & Related papers (2025-02-14T22:37:11Z) - Bisimulation Learning [55.859538562698496]
We compute finite bisimulations of state transition systems with large, possibly infinite state space.
Our technique yields faster verification results than alternative state-of-the-art tools in practice.
arXiv Detail & Related papers (2024-05-24T17:11:27Z) - When Noisy Labels Meet Long Tail Dilemmas: A Representation Calibration
Method [40.25499257944916]
Real-world datasets are both noisily labeled and class-imbalanced.
We propose a representation calibration method RCAL.
We derive theoretical results to discuss the effectiveness of our representation calibration.
arXiv Detail & Related papers (2022-11-20T11:36:48Z) - Testing Independence of Exchangeable Random Variables [19.973896010415977]
Given well-shuffled data, can we determine whether the data items are statistically (in)dependent?
We will show that this is possible and develop tests that can confidently reject the null hypothesis that data is independent and identically distributed.
One potential application is in Deep Learning, where data is often scraped from the whole internet, with duplications abound.
arXiv Detail & Related papers (2022-10-22T08:55:48Z) - Nonparametric Conditional Local Independence Testing [69.31200003384122]
Conditional local independence is an independence relation among continuous time processes.
No nonparametric test of conditional local independence has been available.
We propose such a nonparametric test based on double machine learning.
arXiv Detail & Related papers (2022-03-25T10:31:02Z) - Test Set Sizing Via Random Matrix Theory [91.3755431537592]
This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression.
It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise.
This paper is the first to solve for the training and test size for any model in a way that is truly optimal.
arXiv Detail & Related papers (2021-12-11T13:18:33Z) - Deterministic Certification to Adversarial Attacks via Bernstein
Polynomial Approximation [5.392822954974537]
Randomized smoothing has established state-of-the-art provable robustness against $ell$ norm adversarial attacks with high probability.
We come up with a question, "Is it possible to construct a smoothed classifier without randomization while maintaining natural accuracy?"
Our method provides a deterministic algorithm for decision boundary smoothing.
We also introduce a distinctive approach of norm-independent certified robustness via numerical solutions of nonlinear systems of equations.
arXiv Detail & Related papers (2020-11-28T08:27:42Z) - Tractable Inference in Credal Sentential Decision Diagrams [116.6516175350871]
Probabilistic sentential decision diagrams are logic circuits where the inputs of disjunctive gates are annotated by probability values.
We develop the credal sentential decision diagrams, a generalisation of their probabilistic counterpart that allows for replacing the local probabilities with credal sets of mass functions.
For a first empirical validation, we consider a simple application based on noisy seven-segment display images.
arXiv Detail & Related papers (2020-08-19T16:04:34Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Algebraic Ground Truth Inference: Non-Parametric Estimation of Sample
Errors by AI Algorithms [0.0]
Non-parametric estimators of performance are an attractive solution in autonomous settings.
We show that the accuracy estimators in the experiments where we have ground truth are better than one part in a hundred.
The practical utility of the method is illustrated on a real-world dataset from an online advertising campaign and a sample of common classification benchmarks.
arXiv Detail & Related papers (2020-06-15T12:04:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.