A Model of One-Shot Generalization
- URL: http://arxiv.org/abs/2205.14553v1
- Date: Sun, 29 May 2022 01:41:29 GMT
- Title: A Model of One-Shot Generalization
- Authors: Thomas Laurent, James H. von Brecht, and Xavier Bresson
- Abstract summary: One-shot generalization refers to the ability of an algorithm to perform transfer learning within a single task.
We show that the most direct neural network architecture for our data model performs one-shot generalization almost perfectly.
- Score: 6.155604731137828
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We provide a theoretical framework to study a phenomenon that we call
one-shot generalization. This phenomenon refers to the ability of an algorithm
to perform transfer learning within a single task, meaning that it correctly
classifies a test point that has a single exemplar in the training set. We
propose a simple data model and use it to study this phenomenon in two ways.
First, we prove a non-asymptotic base-line -- kernel methods based on
nearest-neighbor classification cannot perform one-shot generalization,
independently of the choice of the kernel and the size of the training set.
Second, we empirically show that the most direct neural network architecture
for our data model performs one-shot generalization almost perfectly. This
stark differential leads us to believe that the one-shot generalization
mechanism is partially responsible for the empirical success of neural
networks.
Related papers
- Principled Out-of-Distribution Generalization via Simplicity [16.17883058788714]
We study the compositional generalization abilities of diffusion models in image generation.<n>We develop a theoretical framework for OOD generalization via simplicity, quantified using a predefined simplicity metric.<n>We establish the first sharp sample complexity guarantees for learning the true, generalizable, simple model.
arXiv Detail & Related papers (2025-05-28T17:44:10Z) - Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network.
Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z) - Non-Parametric Representation Learning with Kernels [6.944372188747803]
We introduce and analyze several kernel-based representation learning approaches.
We argue that the classical representer theorems for supervised kernel machines are not always applicable for (self-supervised) representation learning.
We empirically evaluate the performance of these methods in both small data regimes as well as in comparison with neural network based models.
arXiv Detail & Related papers (2023-09-05T08:14:25Z) - Synergies between Disentanglement and Sparsity: Generalization and
Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations.
Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z) - Reconciliation of Pre-trained Models and Prototypical Neural Networks in
Few-shot Named Entity Recognition [35.34238362639678]
We propose a one-line-code normalization method to reconcile such a mismatch with empirical and theoretical grounds.
Our work also provides an analytical viewpoint for addressing the general problems in few-shot name entity recognition.
arXiv Detail & Related papers (2022-11-07T02:33:45Z) - With Greater Distance Comes Worse Performance: On the Perspective of
Layer Utilization and Model Generalization [3.6321778403619285]
Generalization of deep neural networks remains one of the main open problems in machine learning.
Early layers generally learn representations relevant to performance on both training data and testing data.
Deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data.
arXiv Detail & Related papers (2022-01-28T05:26:32Z) - How does unlabeled data improve generalization in self-training? A
one-hidden-layer theoretical analysis [93.37576644429578]
This work establishes the first theoretical analysis for the known iterative self-training paradigm.
We prove the benefits of unlabeled data in both training convergence and generalization ability.
Experiments from shallow neural networks to deep neural networks are also provided to justify the correctness of our established theoretical insights on self-training.
arXiv Detail & Related papers (2022-01-21T02:16:52Z) - Self-Ensembling GAN for Cross-Domain Semantic Segmentation [107.27377745720243]
This paper proposes a self-ensembling generative adversarial network (SE-GAN) exploiting cross-domain data for semantic segmentation.
In SE-GAN, a teacher network and a student network constitute a self-ensembling model for generating semantic segmentation maps, which together with a discriminator, forms a GAN.
Despite its simplicity, we find SE-GAN can significantly boost the performance of adversarial training and enhance the stability of the model.
arXiv Detail & Related papers (2021-12-15T09:50:25Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - Uniform Convergence, Adversarial Spheres and a Simple Remedy [40.44709296304123]
Previous work has cast doubt on the general framework of uniform convergence and its ability to explain generalization in neural networks.
We provide an extensive theoretical investigation of the previously studied data setting through the lens of infinitely-wide models.
We prove that the Neural Tangent Kernel (NTK) also suffers from the same phenomenon and we uncover its origin.
arXiv Detail & Related papers (2021-05-07T20:23:01Z) - The Gaussian equivalence of generative models for learning with shallow
neural networks [30.47878306277163]
We study the performance of neural networks trained on data drawn from pre-trained generative models.
We provide three strands of rigorous, analytical and numerical evidence corroborating this equivalence.
These results open a viable path to the theoretical study of machine learning models with realistic data.
arXiv Detail & Related papers (2020-06-25T21:20:09Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.