More Is More -- Narrowing the Generalization Gap by Adding
Classification Heads
- URL: http://arxiv.org/abs/2102.04924v2
- Date: Thu, 11 Feb 2021 12:16:26 GMT
- Title: More Is More -- Narrowing the Generalization Gap by Adding
Classification Heads
- Authors: Roee Cates, Daphna Weinshall
- Abstract summary: We introduce an architecture enhancement for existing neural network models based on input transformations, termed 'TransNet'
Our model can be employed during training time only and then pruned for prediction, resulting in an equivalent architecture to the base model.
- Score: 8.883733362171032
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Overfit is a fundamental problem in machine learning in general, and in deep
learning in particular. In order to reduce overfit and improve generalization
in the classification of images, some employ invariance to a group of
transformations, such as rotations and reflections. However, since not all
objects exhibit necessarily the same invariance, it seems desirable to allow
the network to learn the useful level of invariance from the data. To this end,
motivated by self-supervision, we introduce an architecture enhancement for
existing neural network models based on input transformations, termed
'TransNet', together with a training algorithm suitable for it. Our model can
be employed during training time only and then pruned for prediction, resulting
in an equivalent architecture to the base model. Thus pruned, we show that our
model improves performance on various data-sets while exhibiting improved
generalization, which is achieved in turn by enforcing soft invariance on the
convolutional kernels of the last layer in the base model. Theoretical analysis
is provided to support the proposed method.
Related papers
- Leveraging Angular Information Between Feature and Classifier for
Long-tailed Learning: A Prediction Reformulation Approach [90.77858044524544]
We reformulate the recognition probabilities through included angles without re-balancing the classifier weights.
Inspired by the performance improvement of the predictive form reformulation, we explore the different properties of this angular prediction.
Our method is able to obtain the best performance among peer methods without pretraining on CIFAR10/100-LT and ImageNet-LT.
arXiv Detail & Related papers (2022-12-03T07:52:48Z) - Equivariance with Learned Canonicalization Functions [77.32483958400282]
We show that learning a small neural network to perform canonicalization is better than using predefineds.
Our experiments show that learning the canonicalization function is competitive with existing techniques for learning equivariant functions across many tasks.
arXiv Detail & Related papers (2022-11-11T21:58:15Z) - Towards Disentangling Information Paths with Coded ResNeXt [11.884259630414515]
We take a novel approach to enhance the transparency of the function of the whole network.
We propose a neural network architecture for classification, in which the information that is relevant to each class flows through specific paths.
arXiv Detail & Related papers (2022-02-10T21:45:49Z) - Deep invariant networks with differentiable augmentation layers [87.22033101185201]
Methods for learning data augmentation policies require held-out data and are based on bilevel optimization problems.
We show that our approach is easier and faster to train than modern automatic data augmentation techniques.
arXiv Detail & Related papers (2022-02-04T14:12:31Z) - Understanding the Generalization of Adam in Learning Neural Networks
with Proper Regularization [118.50301177912381]
We show that Adam can converge to different solutions of the objective with provably different errors, even with weight decay globalization.
We show that if convex, and the weight decay regularization is employed, any optimization algorithms including Adam will converge to the same solution.
arXiv Detail & Related papers (2021-08-25T17:58:21Z) - Training or Architecture? How to Incorporate Invariance in Neural
Networks [14.162739081163444]
We propose a method for provably invariant network architectures with respect to group actions.
In a nutshell, we intend to 'undo' any possible transformation before feeding the data into the actual network.
We analyze properties of such approaches, extend them to equivariant networks, and demonstrate their advantages in terms of robustness as well as computational efficiency in several numerical examples.
arXiv Detail & Related papers (2021-06-18T10:31:00Z) - Learning Invariances in Neural Networks [51.20867785006147]
We show how to parameterize a distribution over augmentations and optimize the training loss simultaneously with respect to the network parameters and augmentation parameters.
We can recover the correct set and extent of invariances on image classification, regression, segmentation, and molecular property prediction from a large space of augmentations.
arXiv Detail & Related papers (2020-10-22T17:18:48Z) - Generalizing Convolutional Neural Networks for Equivariance to Lie
Groups on Arbitrary Continuous Data [52.78581260260455]
We propose a general method to construct a convolutional layer that is equivariant to transformations from any specified Lie group.
We apply the same model architecture to images, ball-and-stick molecular data, and Hamiltonian dynamical systems.
arXiv Detail & Related papers (2020-02-25T17:40:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.