Discriminability-enforcing loss to improve representation learning
- URL: http://arxiv.org/abs/2202.07073v1
- Date: Mon, 14 Feb 2022 22:31:37 GMT
- Title: Discriminability-enforcing loss to improve representation learning
- Authors: Florinel-Alin Croitoru, Diana-Nicoleta Grigore, Radu Tudor Ionescu
- Abstract summary: We introduce a new loss term inspired by the Gini impurity to minimize the entropy of individual high-level features.
Although our Gini loss induces highly-discriminative features, it does not ensure that the distribution of the high-level features matches the distribution of the classes.
Our empirical results show that integrating our novel loss terms into the training objective consistently outperforms the models trained with cross-entropy alone.
- Score: 20.4701676109641
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: During the training process, deep neural networks implicitly learn to
represent the input data samples through a hierarchy of features, where the
size of the hierarchy is determined by the number of layers. In this paper, we
focus on enforcing the discriminative power of the high-level representations,
that are typically learned by the deeper layers (closer to the output). To this
end, we introduce a new loss term inspired by the Gini impurity, which is aimed
at minimizing the entropy (increasing the discriminative power) of individual
high-level features with respect to the class labels. Although our Gini loss
induces highly-discriminative features, it does not ensure that the
distribution of the high-level features matches the distribution of the
classes. As such, we introduce another loss term to minimize the
Kullback-Leibler divergence between the two distributions. We conduct
experiments on two image classification data sets (CIFAR-100 and Caltech 101),
considering multiple neural architectures ranging from convolutional networks
(ResNet-17, ResNet-18, ResNet-50) to transformers (CvT). Our empirical results
show that integrating our novel loss terms into the training objective
consistently outperforms the models trained with cross-entropy alone.
Related papers
- Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks [13.983863226803336]
We argue that "Feature Averaging" is one of the principal factors contributing to non-robustness of deep neural networks.
We provide a detailed theoretical analysis of the training dynamics of gradient descent in a two-layer ReLU network for a binary classification task.
We prove that, with the provision of more granular supervised information, a two-layer multi-class neural network is capable of learning individual features.
arXiv Detail & Related papers (2024-10-14T09:28:32Z) - Graph Neural Networks Provably Benefit from Structural Information: A
Feature Learning Perspective [53.999128831324576]
Graph neural networks (GNNs) have pioneered advancements in graph representation learning.
This study investigates the role of graph convolution within the context of feature learning theory.
arXiv Detail & Related papers (2023-06-24T10:21:11Z) - Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum.
Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels.
They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z) - Encoding Hierarchical Information in Neural Networks helps in
Subpopulation Shift [8.01009207457926]
Deep neural networks have proven to be adept in image classification tasks, often surpassing humans in terms of accuracy.
In this work, we study the aforementioned problems through the lens of a novel conditional supervised training framework.
We show that learning in this structured hierarchical manner results in networks that are more robust against subpopulation shifts.
arXiv Detail & Related papers (2021-12-20T20:26:26Z) - Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks.
We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator.
To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z) - Dual-constrained Deep Semi-Supervised Coupled Factorization Network with
Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net.
To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network.
Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z) - ReMarNet: Conjoint Relation and Margin Learning for Small-Sample Image
Classification [49.87503122462432]
We introduce a novel neural network termed Relation-and-Margin learning Network (ReMarNet)
Our method assembles two networks of different backbones so as to learn the features that can perform excellently in both of the aforementioned two classification mechanisms.
Experiments on four image datasets demonstrate that our approach is effective in learning discriminative features from a small set of labeled samples.
arXiv Detail & Related papers (2020-06-27T13:50:20Z) - When Relation Networks meet GANs: Relation GANs with Triplet Loss [110.7572918636599]
Training stability is still a lingering concern of generative adversarial networks (GANs)
In this paper, we explore a relation network architecture for the discriminator and design a triplet loss which performs better generalization and stability.
Experiments on benchmark datasets show that the proposed relation discriminator and new loss can provide significant improvement on variable vision tasks.
arXiv Detail & Related papers (2020-02-24T11:35:28Z) - Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks
Trained with the Logistic Loss [0.0]
Neural networks trained to minimize the logistic (a.k.a. cross-entropy) loss with gradient-based methods are observed to perform well in many supervised classification tasks.
We analyze the training and generalization behavior of infinitely wide two-layer neural networks with homogeneous activations.
arXiv Detail & Related papers (2020-02-11T15:42:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.