Related papers: Efficient Discrepancy Testing for Learning with Distribution Shift

Efficient Discrepancy Testing for Learning with Distribution Shift

URL: http://arxiv.org/abs/2406.09373v1
Date: Thu, 13 Jun 2024 17:51:10 GMT
Title: Efficient Discrepancy Testing for Learning with Distribution Shift
Authors: Gautam Chandrasekaran, Adam R. Klivans, Vasilis Kontonis, Konstantinos Stavropoulos, Arsen Vasilyan,
Abstract summary: We provide the first set of provably efficient algorithms for testing localized discrepancy distance. Results imply a broad set of new, efficient learning algorithms in the recently introduced model of Testable Learning with Distribution Shift.
Score: 17.472049019016524
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A fundamental notion of distance between train and test distributions from the field of domain adaptation is discrepancy distance. While in general hard to compute, here we provide the first set of provably efficient algorithms for testing localized discrepancy distance, where discrepancy is computed with respect to a fixed output classifier. These results imply a broad set of new, efficient learning algorithms in the recently introduced model of Testable Learning with Distribution Shift (TDS learning) due to Klivans et al. (2023). Our approach generalizes and improves all prior work on TDS learning: (1) we obtain universal learners that succeed simultaneously for large classes of test distributions, (2) achieve near-optimal error rates, and (3) give exponential improvements for constant depth circuits. Our methods further extend to semi-parametric settings and imply the first positive results for low-dimensional convex sets. Additionally, we separate learning and testing phases and obtain algorithms that run in fully polynomial time at test time.

Related papers

Scaling Test-Time Compute Without Verification or RL is Suboptimal [70.28430200655919]
We show that finetuning LLMs with verifier-based (VB) methods based on RL or search is far superior to verifier-free (VF) approaches based on distilling or cloning search traces, given a fixed amount of compute/data budget. We corroborate our theory empirically on both didactic and math reasoning problems with 3/8B-sized pre-trained LLMs, where we find verification is crucial for scaling test-time compute.
arXiv Detail & Related papers (2025-02-17T18:43:24Z)
Tolerant Algorithms for Learning with Arbitrary Covariate Shift [18.37181965815327]
We study the problem of learning under arbitrary distribution shift, where the learner is trained on a labeled set from one distribution but evaluated on a different, potentially adversarially generated test distribution. We focus on two frameworks: PQ learning [Goldwasser, A. Kalai, Y. Kalai, Montasser NeurIPS 2020], allowing abstention on adversarially generated parts of the test distribution, and TDS learning [Klivans, Stavropoulos, Vasilyan COLT 2024], permitting abstention on the entire test distribution if distribution shift is detected.
arXiv Detail & Related papers (2024-06-04T19:50:05Z)
Enhancing Out-of-Distribution Detection with Multitesting-based Layer-wise Feature Fusion [11.689517005768046]
Out-of-distribution samples may exhibit shifts in local or global features compared to the training distribution. We propose a novel framework, Multitesting-based Layer-wise Out-of-Distribution (OOD) Detection. Our scheme effectively enhances the performance of out-of-distribution detection when compared to baseline methods.
arXiv Detail & Related papers (2024-03-16T04:35:04Z)
Testable Learning with Distribution Shift [9.036777309376697]
We define a new model called testable learning with distribution shift. We obtain provably efficient algorithms for certifying the performance of a classifier on a test distribution. We give several positive results for learning concept classes such as halfspaces, intersections of halfspaces, and decision trees.
arXiv Detail & Related papers (2023-11-25T23:57:45Z)
Faster Adaptive Federated Learning [84.38913517122619]
Federated learning has attracted increasing attention with the emergence of distributed data. In this paper, we propose an efficient adaptive algorithm (i.e., FAFED) based on momentum-based variance reduced technique in cross-silo FL.
arXiv Detail & Related papers (2022-12-02T05:07:50Z)
A Moment-Matching Approach to Testable Learning and a New Characterization of Rademacher Complexity [15.746613321517282]
We give a powerful new approach for developing algorithms for testable learning using tools from moment matching and metric agnostic in probability. Surprisingly, we show that the information-theoretic complexity of testable learning is tightly characterized by the Rademacher complexity of the concept class.
arXiv Detail & Related papers (2022-11-23T21:29:51Z)
Intersection of Parallels as an Early Stopping Criterion [64.8387564654474]
We propose a method to spot an early stopping point in the training iterations without the need for a validation set. For a wide range of learning rates, our method, called Cosine-Distance Criterion (CDC), leads to better generalization on average than all the methods that we compare against.
arXiv Detail & Related papers (2022-08-19T19:42:41Z)
Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances. Online descent (OGD) is a popular approach to handle streaming data in pairwise learning. In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z)
Cross-validation Confidence Intervals for Test Error [83.67415139421448]
This work develops central limit theorems for crossvalidation and consistent estimators of its variance under weak stability conditions on the learning algorithm. Results are the first of their kind for the popular choice of leave-one-out cross-validation.
arXiv Detail & Related papers (2020-07-24T17:40:06Z)
Adaptive Risk Minimization: Learning to Adapt to Domain Shift [109.87561509436016]
A fundamental assumption of most machine learning algorithms is that the training and test data are drawn from the same underlying distribution. In this work, we consider the problem setting of domain generalization, where the training data are structured into domains and there may be multiple test time shifts. We introduce the framework of adaptive risk minimization (ARM), in which models are directly optimized for effective adaptation to shift by learning to adapt on the training domains.
arXiv Detail & Related papers (2020-07-06T17:59:30Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.