How Does Independence Help Generalization? Sample Complexity of ERM on
Product Distributions
- URL: http://arxiv.org/abs/2212.06422v1
- Date: Tue, 13 Dec 2022 08:14:32 GMT
- Title: How Does Independence Help Generalization? Sample Complexity of ERM on
Product Distributions
- Authors: Tao Lin
- Abstract summary: We show that even though Empirical Risk Minimization (ERM) needs an exponential number of samples to learn on product distributions, an algorithm designed specifically for product distributions is needed.
This leads to the conclusion that a product distribution by itself does not make a learning problem easier -- an algorithm designed specifically for product distributions is needed.
- Score: 5.553167334488855
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While many classical notions of learnability (e.g., PAC learnability) are
distribution-free, utilizing the specific structures of an input distribution
may improve learning performance. For example, a product distribution on a
multi-dimensional input space has a much simpler structure than a correlated
distribution. A recent paper [GHTZ21] shows that the sample complexity of a
general learning problem on product distributions is polynomial in the input
dimension, which is exponentially smaller than that on correlated
distributions. However, the learning algorithm they use is not the standard
Empirical Risk Minimization (ERM) algorithm. In this note, we characterize the
sample complexity of ERM in a general learning problem on product
distributions. We show that, even though product distributions are simpler than
correlated distributions, ERM still needs an exponential number of samples to
learn on product distributions, instead of a polynomial. This leads to the
conclusion that a product distribution by itself does not make a learning
problem easier -- an algorithm designed specifically for product distributions
is needed.
Related papers
- Learning Intersections of Two Margin Halfspaces under Factorizable Distributions [56.51474048985742]
Learning intersections of halfspaces is a central problem in Computational Learning Theory.<n>Even for just two halfspaces, it remains a major open question whether learning is possible in time.<n>We introduce a novel algorithm that provably circumvents the CSQ hardness barrier.
arXiv Detail & Related papers (2025-11-13T00:28:24Z) - Distribution Prompting: Understanding the Expressivity of Language Models Through the Next-Token Distributions They Can Produce [10.369289331969098]
We show that some distributions are significantly harder to elicit than others.<n>We find that distributions with very low or very high entropy are easier to approximate than those with moderate entropy.
arXiv Detail & Related papers (2025-05-18T05:49:48Z) - Distributional MIPLIB: a Multi-Domain Library for Advancing ML-Guided MILP Methods [14.819629773624348]
Mixed Linear Programming (MILP) is a fundamental tool for modeling optimization problems.
Despite the increasing popularity of this approach, there is a lack of a common repository that provides distributions of similar MILP instances.
We introduce Distributional MIPLIB, a library of problem distributions for advancing ML-guided MILP methods.
arXiv Detail & Related papers (2024-06-11T05:25:38Z) - Collaborative Learning with Different Labeling Functions [7.228285747845779]
We study a variant of Collaborative PAC Learning, in which we aim to learn an accurate classifier for each of the $n$ data distributions.
We show that, when the data distributions satisfy a weaker realizability assumption, sample-efficient learning is still feasible.
arXiv Detail & Related papers (2024-02-16T04:32:22Z) - Domain Generalization by Functional Regression [3.209698860006188]
We study domain generalization as a problem of functional regression.
Our concept leads to a new algorithm for learning a linear operator from marginal distributions of inputs to the corresponding conditional distributions of outputs given inputs.
arXiv Detail & Related papers (2023-02-09T16:07:21Z) - On-Demand Sampling: Learning Optimally from Multiple Distributions [63.20009081099896]
Social and real-world considerations have given rise to multi-distribution learning paradigms.
We establish the optimal sample complexity of these learning paradigms and give algorithms that meet this sample complexity.
Our algorithm design and analysis are enabled by our extensions of online learning techniques for solving zero-sum games.
arXiv Detail & Related papers (2022-10-22T19:07:26Z) - Unrolling Particles: Unsupervised Learning of Sampling Distributions [102.72972137287728]
Particle filtering is used to compute good nonlinear estimates of complex systems.
We show in simulations that the resulting particle filter yields good estimates in a wide range of scenarios.
arXiv Detail & Related papers (2021-10-06T16:58:34Z) - Forster Decomposition and Learning Halfspaces with Noise [60.691817861402676]
A Forster transform is an operation that turns a distribution into one with good anti-concentration properties.
We show that any distribution can be decomposed efficiently as a disjoint mixture of few distributions for which a Forster transform exists and can be computed efficiently.
arXiv Detail & Related papers (2021-07-12T17:00:59Z) - Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics.
We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.
Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z) - An Online Learning Approach to Interpolation and Extrapolation in Domain
Generalization [53.592597682854944]
We recast generalization over sub-groups as an online game between a player minimizing risk and an adversary presenting new test.
We show that ERM is provably minimax-optimal for both tasks.
arXiv Detail & Related papers (2021-02-25T19:06:48Z) - SURF: A Simple, Universal, Robust, Fast Distribution Learning Algorithm [64.13217062232874]
SURF is an algorithm for approximating distributions by piecewises.
It outperforms state-of-the-art algorithms in experiments.
arXiv Detail & Related papers (2020-02-22T01:03:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.