Testing Noise Assumptions of Learning Algorithms
- URL: http://arxiv.org/abs/2501.09189v1
- Date: Wed, 15 Jan 2025 22:33:55 GMT
- Title: Testing Noise Assumptions of Learning Algorithms
- Authors: Surbhi Goel, Adam R. Klivans, Konstantinos Stavropoulos, Arsen Vasilyan,
- Abstract summary: We pose a fundamental question in computational learning theory.
Can we efficiently test whether a training set satisfies the assumptions of a given noise model?
We present the first efficient algorithm to test various noise assumptions on the training data.
- Score: 20.28174893000374
- License:
- Abstract: We pose a fundamental question in computational learning theory: can we efficiently test whether a training set satisfies the assumptions of a given noise model? This question has remained unaddressed despite decades of research on learning in the presence of noise. In this work, we show that this task is tractable and present the first efficient algorithm to test various noise assumptions on the training data. To model this question, we extend the recently proposed testable learning framework of Rubinfeld and Vasilyan (2023) and require a learner to run an associated test that satisfies the following two conditions: (1) whenever the test accepts, the learner outputs a classifier along with a certificate of optimality, and (2) the test must pass for any dataset drawn according to a specified modeling assumption on both the marginal distribution and the noise model. We then consider the problem of learning halfspaces over Gaussian marginals with Massart noise (where each label can be flipped with probability less than $1/2$ depending on the input features), and give a fully-polynomial time testable learning algorithm. We also show a separation between the classical setting of learning in the presence of structured noise and testable learning. In fact, for the simple case of random classification noise (where each label is flipped with fixed probability $\eta = 1/2$), we show that testable learning requires super-polynomial time while classical learning is trivial.
Related papers
- ROG$_{PL}$: Robust Open-Set Graph Learning via Region-Based Prototype
Learning [52.60434474638983]
We propose a unified framework named ROG$_PL$ to achieve robust open-set learning on complex noisy graph data.
The framework consists of two modules, i.e., denoising via label propagation and open-set prototype learning via regions.
To the best of our knowledge, the proposed ROG$_PL$ is the first robust open-set node classification method for graph data with complex noise.
arXiv Detail & Related papers (2024-02-28T17:25:06Z) - Testable Learning with Distribution Shift [9.036777309376697]
We define a new model called testable learning with distribution shift.
We obtain provably efficient algorithms for certifying the performance of a classifier on a test distribution.
We give several positive results for learning concept classes such as halfspaces, intersections of halfspaces, and decision trees.
arXiv Detail & Related papers (2023-11-25T23:57:45Z) - Noisy Pair Corrector for Dense Retrieval [59.312376423104055]
We propose a novel approach called Noisy Pair Corrector (NPC)
NPC consists of a detection module and a correction module.
We conduct experiments on text-retrieval benchmarks Natural Question and TriviaQA, code-search benchmarks StaQC and SO-DS.
arXiv Detail & Related papers (2023-11-07T08:27:14Z) - Efficient Testable Learning of Halfspaces with Adversarial Label Noise [44.32410859776227]
In the recently introduced testable learning model, one is required to produce a tester-learner such that if the data passes the tester, then one can trust the output of the robust learner on the data.
Our tester-learner runs in time $poly(d/eps)$ and outputs a halfspace with misclassification error $O(opt)+eps$, where $opt$ is the 0-1 error of the best fitting halfspace.
arXiv Detail & Related papers (2023-03-09T18:38:46Z) - An Efficient Tester-Learner for Halfspaces [13.13131953359806]
We give the first efficient algorithm for learning halfspaces in the testable learning model recently defined by Rubinfeld and Vasilyan (2023)
In this model, a learner certifies that the accuracy of its output hypothesis is near optimal whenever the training set passes a test.
arXiv Detail & Related papers (2023-02-28T18:51:55Z) - A Moment-Matching Approach to Testable Learning and a New
Characterization of Rademacher Complexity [15.746613321517282]
We give a powerful new approach for developing algorithms for testable learning using tools from moment matching and metric agnostic in probability.
Surprisingly, we show that the information-theoretic complexity of testable learning is tightly characterized by the Rademacher complexity of the concept class.
arXiv Detail & Related papers (2022-11-23T21:29:51Z) - The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators.
In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z) - Smoothed Embeddings for Certified Few-Shot Learning [63.68667303948808]
We extend randomized smoothing to few-shot learning models that map inputs to normalized embeddings.
Our results are confirmed by experiments on different datasets.
arXiv Detail & Related papers (2022-02-02T18:19:04Z) - Robust Learning under Strong Noise via SQs [5.9256596453465225]
We show that every SQ learnable class admits an efficient learning algorithm with OPT + $epsilon misilon misclassification error for a broad class of noise models.
This setting substantially generalizes the widely-studied problem classification under RCN with known noise probabilities.
arXiv Detail & Related papers (2020-10-18T21:02:26Z) - A Polynomial Time Algorithm for Learning Halfspaces with Tsybakov Noise [55.45544638410858]
We study the problem of PAC learning homogeneous halfspaces in the presence of Tsybakov noise.
Our algorithm learns the true halfspace within any accuracy $epsilon$.
arXiv Detail & Related papers (2020-10-04T22:19:06Z) - Learning Halfspaces with Tsybakov Noise [50.659479930171585]
We study the learnability of halfspaces in the presence of Tsybakov noise.
We give an algorithm that achieves misclassification error $epsilon$ with respect to the true halfspace.
arXiv Detail & Related papers (2020-06-11T14:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.