Understanding Generalization via Set Theory
- URL: http://arxiv.org/abs/2311.06545v1
- Date: Sat, 11 Nov 2023 11:47:29 GMT
- Title: Understanding Generalization via Set Theory
- Authors: Shiqi Liu
- Abstract summary: Generalization is at the core of machine learning models.
We employ set theory to introduce the concepts of algorithms, hypotheses, and dataset generalization.
- Score: 1.6475699373210055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalization is at the core of machine learning models. However, the
definition of generalization is not entirely clear. We employ set theory to
introduce the concepts of algorithms, hypotheses, and dataset generalization.
We analyze the properties of dataset generalization and prove a theorem on
surrogate generalization procedures. This theorem leads to our generalization
method. Through a generalization experiment on the MNIST dataset, we obtain
13,541 sample bases. When we use the entire training set to evaluate the
model's performance, the models achieve an accuracy of 99.945%. However, if we
shift the sample bases or modify the neural network structure, the performance
experiences a significant decline. We also identify consistently mispredicted
samples and find that they are all challenging examples. The experiments
substantiated the accuracy of the generalization definition and the
effectiveness of the proposed methods. Both the set-theoretic deduction and the
experiments help us better understand generalization.
Related papers
- Class-wise Generalization Error: an Information-Theoretic Analysis [22.877440350595222]
We study the class-generalization error, which quantifies the generalization performance of each individual class.
We empirically validate our proposed bounds in different neural networks and show that they accurately capture the complex class-generalization error behavior.
arXiv Detail & Related papers (2024-01-05T17:05:14Z) - On the Generalization Properties of Diffusion Models [33.93850788633184]
This work embarks on a comprehensive theoretical exploration of the generalization attributes of diffusion models.
We establish theoretical estimates of the generalization gap that evolves in tandem with the training dynamics of score-based diffusion models.
We extend our quantitative analysis to a data-dependent scenario, wherein target distributions are portrayed as a succession of densities.
arXiv Detail & Related papers (2023-11-03T09:20:20Z) - SLOG: A Structural Generalization Benchmark for Semantic Parsing [68.19511282584304]
The goal of compositional generalization benchmarks is to evaluate how well models generalize to new complex linguistic expressions.
Existing benchmarks often focus on lexical generalization, the interpretation of novel lexical items in syntactic structures familiar from training, are often underrepresented.
We introduce SLOG, a semantic parsing dataset that extends COGS with 17 structural generalization cases.
arXiv Detail & Related papers (2023-10-23T15:39:09Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - More Than a Toy: Random Matrix Models Predict How Real-World Neural
Representations Generalize [94.70343385404203]
We find that most theoretical analyses fall short of capturing qualitative phenomena even for kernel regression.
We prove that the classical GCV estimator converges to the generalization risk whenever a local random matrix law holds.
Our findings suggest that random matrix theory may be central to understanding the properties of neural representations in practice.
arXiv Detail & Related papers (2022-03-11T18:59:01Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - Measuring Generalization with Optimal Transport [111.29415509046886]
We develop margin-based generalization bounds, where the margins are normalized with optimal transport costs.
Our bounds robustly predict the generalization error, given training data and network parameters, on large scale datasets.
arXiv Detail & Related papers (2021-06-07T03:04:59Z) - Out-of-Distribution Generalization in Kernel Regression [21.958028127426196]
We study generalization in kernel regression when the training and test distributions are different.
We identify an overlap matrix that quantifies the mismatch between distributions for a given kernel.
We develop procedures for optimizing training and test distributions for a given data budget to find best and worst case generalizations under the shift.
arXiv Detail & Related papers (2021-06-04T04:54:25Z) - Robustness to Augmentations as a Generalization metric [0.0]
Generalization is the ability of a model to predict on unseen domains.
We propose a method to predict the generalization performance of a model by using the concept that models that are robust to augmentations are more generalizable than those which are not.
The proposed method was the first runner up solution for the NeurIPS competition on Predicting Generalization in Deep Learning.
arXiv Detail & Related papers (2021-01-16T15:36:38Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.