VC Dimension and Distribution-Free Sample-Based Testing
- URL: http://arxiv.org/abs/2012.03923v1
- Date: Mon, 7 Dec 2020 18:50:46 GMT
- Title: VC Dimension and Distribution-Free Sample-Based Testing
- Authors: Eric Blais, Renato Ferreira Pinto Jr., Nathaniel Harms
- Abstract summary: We consider the problem of determining which classes of functions can be tested more efficiently than they can be learned.
We show that two natural classes of functions, juntas and monotone functions, can be tested with a number of samples that isly smaller than the number of samples required for PAC learning.
- Score: 0.8830479021890576
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of determining which classes of functions can be
tested more efficiently than they can be learned, in the distribution-free
sample-based model that corresponds to the standard PAC learning setting. Our
main result shows that while VC dimension by itself does not always provide
tight bounds on the number of samples required to test a class of functions in
this model, it can be combined with a closely-related variant that we call
"lower VC" (or LVC) dimension to obtain strong lower bounds on this sample
complexity.
We use this result to obtain strong and in many cases nearly optimal lower
bounds on the sample complexity for testing unions of intervals, halfspaces,
intersections of halfspaces, polynomial threshold functions, and decision
trees. Conversely, we show that two natural classes of functions, juntas and
monotone functions, can be tested with a number of samples that is polynomially
smaller than the number of samples required for PAC learning.
Finally, we also use the connection between VC dimension and property testing
to establish new lower bounds for testing radius clusterability and testing
feasibility of linear constraint systems.
Related papers
- MulTi-Wise Sampling: Trading Uniform T-Wise Feature Interaction Coverage for Smaller Samples [4.337339380445765]
Existing t-wise sampling algorithms uniformly cover t-wise feature interactions for all features.
We introduce a novel approach to t-wise feature interaction sampling, questioning the necessity of uniform coverage.
Our approach prioritizes between subsets of critical and non-critical features, considering higher t-values for subsets of critical features.
arXiv Detail & Related papers (2024-06-28T10:16:10Z) - Federated Nonparametric Hypothesis Testing with Differential Privacy Constraints: Optimal Rates and Adaptive Tests [5.3595271893779906]
Federated learning has attracted significant recent attention due to its applicability across a wide range of settings where data is collected and analyzed across disparate locations.
We study federated nonparametric goodness-of-fit testing in the white-noise-with-drift model under distributed differential privacy (DP) constraints.
arXiv Detail & Related papers (2024-06-10T19:25:19Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Deep anytime-valid hypothesis testing [29.273915933729057]
We propose a general framework for constructing powerful, sequential hypothesis tests for nonparametric testing problems.
We develop a principled approach of leveraging the representation capability of machine learning models within the testing-by-betting framework.
Empirical results on synthetic and real-world datasets demonstrate that tests instantiated using our general framework are competitive against specialized baselines.
arXiv Detail & Related papers (2023-10-30T09:46:19Z) - Testing properties of distributions in the streaming model [0.0]
We study distribution testing in the standard access model and the conditional access model.
The goal is to test the properties of distribution using an optimal number of samples subject to a memory constraint.
arXiv Detail & Related papers (2023-09-06T10:53:29Z) - Non-generative Generalized Zero-shot Learning via Task-correlated
Disentanglement and Controllable Samples Synthesis [20.34562156468408]
We propose a non-generative model to address these problems.
In addition, we formulate a new ZSL task named the 'Few-shot Seen class and Zero-shot Unseen class learning' (FSZU)
arXiv Detail & Related papers (2022-03-10T12:32:26Z) - A Characterization of Semi-Supervised Adversarially-Robust PAC Learnability [57.502573663108535]
We study the problem of learning an adversarially robust predictor to test time attacks in the semi-supervised PAC model.
We show that there is a significant benefit in semi-supervised robust learning even in the worst-case distribution-free model.
arXiv Detail & Related papers (2022-02-11T03:01:45Z) - Sample-Efficient Reinforcement Learning Is Feasible for Linearly
Realizable MDPs with Limited Revisiting [60.98700344526674]
Low-complexity models such as linear function representation play a pivotal role in enabling sample-efficient reinforcement learning.
In this paper, we investigate a new sampling protocol, which draws samples in an online/exploratory fashion but allows one to backtrack and revisit previous states in a controlled and infrequent manner.
We develop an algorithm tailored to this setting, achieving a sample complexity that scales practicallyly with the feature dimension, the horizon, and the inverse sub-optimality gap, but not the size of the state/action space.
arXiv Detail & Related papers (2021-05-17T17:22:07Z) - Bilinear Classes: A Structural Framework for Provable Generalization in
RL [119.42509700822484]
Bilinear Classes is a new structural framework which permits generalization in reinforcement learning.
The framework incorporates nearly all existing models in which a sample complexity is achievable.
Our main result provides an RL algorithm which has sample complexity for Bilinear Classes.
arXiv Detail & Related papers (2021-03-19T16:34:20Z) - Bloom Origami Assays: Practical Group Testing [90.2899558237778]
Group testing is a well-studied problem with several appealing solutions.
Recent biological studies impose practical constraints for COVID-19 that are incompatible with traditional methods.
We develop a new method combining Bloom filters with belief propagation to scale to larger values of n (more than 100) with good empirical results.
arXiv Detail & Related papers (2020-07-21T19:31:41Z) - Refined bounds for algorithm configuration: The knife-edge of dual class
approximability [94.83809668933021]
We investigate how large should a training set be to ensure that a parameter's average metrics performance over the training set is close to its expected, future performance.
We show that if this approximation holds under the L-infinity norm, we can provide strong sample complexity bounds.
We empirically evaluate our bounds in the context of integer programming, one of the most powerful tools in computer science.
arXiv Detail & Related papers (2020-06-21T15:32:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.