Large Neural Networks Learning from Scratch with Very Few Data and
without Regularization
- URL: http://arxiv.org/abs/2205.08836v1
- Date: Wed, 18 May 2022 10:08:28 GMT
- Title: Large Neural Networks Learning from Scratch with Very Few Data and
without Regularization
- Authors: Christoph Linse, Thomas Martinetz
- Abstract summary: We show that very large Convolutional Neural Networks with millions of weights do learn with only a handful of training samples.
VGG19 with 140 million weights learns to distinguish airplanes and motorbikes up to 95% accuracy with only 20 samples per class.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent findings have shown that Neural Networks generalize also in
over-parametrized regimes with zero training error. This is surprising, since
it is completely against traditional machine learning wisdom. In our empirical
study we fortify these findings in the domain of fine-grained image
classification. We show that very large Convolutional Neural Networks with
millions of weights do learn with only a handful of training samples and
without image augmentation, explicit regularization or pretraining. We train
the architectures ResNet018, ResNet101 and VGG19 on subsets of the difficult
benchmark datasets Caltech101, CUB_200_2011, FGVCAircraft, Flowers102 and
StanfordCars with 100 classes and more, perform a comprehensive comparative
study and draw implications for the practical application of CNNs. Finally, we
show that VGG19 with 140 million weights learns to distinguish airplanes and
motorbikes up to 95% accuracy with only 20 samples per class.
Related papers
- CUDA: Convolution-based Unlearnable Datasets [77.70422525613084]
Large-scale training of modern deep learning models heavily relies on publicly available data on the web.
Recent works aim to make data for deep learning models by adding small, specially designed noises.
These methods are vulnerable to adversarial training (AT) and/or are computationally heavy.
arXiv Detail & Related papers (2023-03-07T22:57:23Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - The Unreasonable Effectiveness of Fully-Connected Layers for Low-Data
Regimes [3.7189423451031356]
We propose a framework to improve generalization from small amounts of data.
We augment modern CNNs with fully-connected layers and show the massive impact this architectural change has in low-data regimes.
arXiv Detail & Related papers (2022-10-11T17:55:10Z) - Capsule Network based Contrastive Learning of Unsupervised Visual
Representations [13.592112044121683]
Contrastive Capsule (CoCa) Model is a Siamese style Capsule Network using Contrastive loss with our novel architecture, training and testing algorithm.
We evaluate the model on unsupervised image classification CIFAR-10 dataset and achieve a top-1 test accuracy of 70.50% and top-5 test accuracy of 98.10%.
Due to our efficient architecture our model has 31 times less parameters and 71 times less FLOPs than the current SOTA in both supervised and unsupervised learning.
arXiv Detail & Related papers (2022-09-22T19:05:27Z) - Learning Rate Curriculum [75.98230528486401]
We propose a novel curriculum learning approach termed Learning Rate Curriculum (LeRaC)
LeRaC uses a different learning rate for each layer of a neural network to create a data-agnostic curriculum during the initial training epochs.
We compare our approach with Curriculum by Smoothing (CBS), a state-of-the-art data-agnostic curriculum learning approach.
arXiv Detail & Related papers (2022-05-18T18:57:36Z) - Training Vision Transformers with Only 2040 Images [35.86457465241119]
Vision Transformers (ViTs) is emerging as an alternative to convolutional neural networks (CNNs) for visual recognition.
We give theoretical analyses that our method is superior to other methods in that it can capture both feature alignment and instance similarities.
We achieve state-of-the-art results when training from scratch on 7 small datasets under various ViT backbones.
arXiv Detail & Related papers (2022-01-26T03:22:08Z) - On the Effectiveness of Neural Ensembles for Image Classification with
Small Datasets [2.3478438171452014]
We focus on image classification problems with a few labeled examples per class and improve data efficiency by using an ensemble of relatively small networks.
We show that ensembling relatively shallow networks is a simple yet effective technique that is generally better than current state-of-the-art approaches for learning from small datasets.
arXiv Detail & Related papers (2021-11-29T12:34:49Z) - Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive
Benchmark Study [100.27567794045045]
Training deep graph neural networks (GNNs) is notoriously hard.
We present the first fair and reproducible benchmark dedicated to assessing the "tricks" of training deep GNNs.
arXiv Detail & Related papers (2021-08-24T05:00:37Z) - Reasoning-Modulated Representations [85.08205744191078]
We study a common setting where our task is not purely opaque.
Our approach paves the way for a new class of data-efficient representation learning.
arXiv Detail & Related papers (2021-07-19T13:57:13Z) - Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then
Training It Toughly [114.81028176850404]
Training generative adversarial networks (GANs) with limited data generally results in deteriorated performance and collapsed models.
We decompose the data-hungry GAN training into two sequential sub-problems.
Such a coordinated framework enables us to focus on lower-complexity and more data-efficient sub-problems.
arXiv Detail & Related papers (2021-02-28T05:20:29Z) - Testing for Normality with Neural Networks [0.0]
We construct a feedforward neural network that can successfully detect normal distributions by inspecting small samples from them.
The network's accuracy was higher than 96% on a set of larger samples with 250-1000 elements.
arXiv Detail & Related papers (2020-09-29T07:35:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.