Inversion dynamics of class manifolds in deep learning reveals tradeoffs
underlying generalisation
- URL: http://arxiv.org/abs/2303.05161v2
- Date: Fri, 23 Feb 2024 17:21:40 GMT
- Title: Inversion dynamics of class manifolds in deep learning reveals tradeoffs
underlying generalisation
- Authors: Simone Ciceri, Lorenzo Cassani, Matteo Osella, Pietro Rotondo, Filippo
Valle, Marco Gherardi
- Abstract summary: We report on numerical experiments showing how the optimisation dynamics finds representations that balance opposing tendencies with a non-monotonic trend.
The training error at the inversion is stable under subsampling, and across network initialisations and optimisers, which characterises it as a property solely of the data structure and (very weakly) of the architecture.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To achieve near-zero training error in a classification problem, the layers
of a feed-forward network have to disentangle the manifolds of data points with
different labels, to facilitate the discrimination. However, excessive class
separation can bring to overfitting since good generalisation requires learning
invariant features, which involve some level of entanglement. We report on
numerical experiments showing how the optimisation dynamics finds
representations that balance these opposing tendencies with a non-monotonic
trend. After a fast segregation phase, a slower rearrangement (conserved across
data sets and architectures) increases the class entanglement.The training
error at the inversion is stable under subsampling, and across network
initialisations and optimisers, which characterises it as a property solely of
the data structure and (very weakly) of the architecture. The inversion is the
manifestation of tradeoffs elicited by well-defined and maximally stable
elements of the training set, coined ``stragglers'', particularly influential
for generalisation.
Related papers
- Neural Collapse Terminus: A Unified Solution for Class Incremental
Learning and Its Variants [166.916517335816]
In this paper, we offer a unified solution to the misalignment dilemma in the three tasks.
We propose neural collapse terminus that is a fixed structure with the maximal equiangular inter-class separation for the whole label space.
Our method holds the neural collapse optimality in an incremental fashion regardless of data imbalance or data scarcity.
arXiv Detail & Related papers (2023-08-03T13:09:59Z) - Heterogeneous Target Speech Separation [52.05046029743995]
We introduce a new paradigm for single-channel target source separation where the sources of interest can be distinguished using non-mutually exclusive concepts.
Our proposed heterogeneous separation framework can seamlessly leverage datasets with large distribution shifts.
arXiv Detail & Related papers (2022-04-07T17:14:20Z) - Soft-margin classification of object manifolds [0.0]
A neural population responding to multiple appearances of a single object defines a manifold in the neural response space.
The ability to classify such manifold is of interest, as object recognition and other computational tasks require a response that is insensitive to variability within a manifold.
Soft-margin classifiers are a larger class of algorithms and provide an additional regularization parameter used in applications to optimize performance outside the training set.
arXiv Detail & Related papers (2022-03-14T12:23:36Z) - Learning Debiased and Disentangled Representations for Semantic
Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation.
By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes.
Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z) - Analyzing Overfitting under Class Imbalance in Neural Networks for Image
Segmentation [19.259574003403998]
In image segmentation neural networks may overfit to the foreground samples from small structures.
In this study, we provide new insights on the problem of overfitting under class imbalance by inspecting the network behavior.
arXiv Detail & Related papers (2021-02-20T14:57:58Z) - More Is More -- Narrowing the Generalization Gap by Adding
Classification Heads [8.883733362171032]
We introduce an architecture enhancement for existing neural network models based on input transformations, termed 'TransNet'
Our model can be employed during training time only and then pruned for prediction, resulting in an equivalent architecture to the base model.
arXiv Detail & Related papers (2021-02-09T16:30:33Z) - Supercharging Imbalanced Data Learning With Energy-based Contrastive
Representation Transfer [72.5190560787569]
In computer vision, learning from long tailed datasets is a recurring theme, especially for natural image datasets.
Our proposal posits a meta-distributional scenario, where the data generating mechanism is invariant across the label-conditional feature distributions.
This allows us to leverage a causal data inflation procedure to enlarge the representation of minority classes.
arXiv Detail & Related papers (2020-11-25T00:13:11Z) - Bi-tuning of Pre-trained Representations [79.58542780707441]
Bi-tuning is a general learning framework to fine-tune both supervised and unsupervised pre-trained representations to downstream tasks.
Bi-tuning generalizes the vanilla fine-tuning by integrating two heads upon the backbone of pre-trained representations.
Bi-tuning achieves state-of-the-art results for fine-tuning tasks of both supervised and unsupervised pre-trained models by large margins.
arXiv Detail & Related papers (2020-11-12T03:32:25Z) - Learning perturbation sets for robust machine learning [97.6757418136662]
We use a conditional generator that defines the perturbation set over a constrained region of the latent space.
We measure the quality of our learned perturbation sets both quantitatively and qualitatively.
We leverage our learned perturbation sets to train models which are empirically and certifiably robust to adversarial image corruptions and adversarial lighting variations.
arXiv Detail & Related papers (2020-07-16T16:39:54Z) - Learning Diverse and Discriminative Representations via the Principle of
Maximal Coding Rate Reduction [32.21975128854042]
We propose the principle of Maximal Coding Rate Reduction ($textMCR2$), an information-theoretic measure that maximizes the coding rate difference between the whole dataset and the sum of each individual class.
We clarify its relationships with most existing frameworks such as cross-entropy, information bottleneck, information gain, contractive and contrastive learning, and provide theoretical guarantees for learning diverse and discriminative features.
arXiv Detail & Related papers (2020-06-15T17:23:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.