Knowledge Evolution in Neural Networks
- URL: http://arxiv.org/abs/2103.05152v1
- Date: Tue, 9 Mar 2021 00:25:34 GMT
- Title: Knowledge Evolution in Neural Networks
- Authors: Ahmed Taha, Abhinav Shrivastava, Larry Davis
- Abstract summary: We propose an evolution-inspired training approach to boost performance on relatively small datasets.
We iteratively evolve the knowledge inside the fit-hypothesis by perturbing the reset-hypothesis for multiple generations.
This approach not only boosts performance, but also learns a slim network with a smaller inference cost.
- Score: 39.52537143009937
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning relies on the availability of a large corpus of data (labeled
or unlabeled). Thus, one challenging unsettled question is: how to train a deep
network on a relatively small dataset? To tackle this question, we propose an
evolution-inspired training approach to boost performance on relatively small
datasets. The knowledge evolution (KE) approach splits a deep network into two
hypotheses: the fit-hypothesis and the reset-hypothesis. We iteratively evolve
the knowledge inside the fit-hypothesis by perturbing the reset-hypothesis for
multiple generations. This approach not only boosts performance, but also
learns a slim network with a smaller inference cost. KE integrates seamlessly
with both vanilla and residual convolutional networks. KE reduces both
overfitting and the burden for data collection.
We evaluate KE on various network architectures and loss functions. We
evaluate KE using relatively small datasets (e.g., CUB-200) and randomly
initialized deep networks. KE achieves an absolute 21% improvement margin on a
state-of-the-art baseline. This performance improvement is accompanied by a
relative 73% reduction in inference cost. KE achieves state-of-the-art results
on classification and metric learning benchmarks. Code available at
http://bit.ly/3uLgwYb
Related papers
- Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - DRGCN: Dynamic Evolving Initial Residual for Deep Graph Convolutional
Networks [19.483662490506646]
We propose a novel model called Dynamic evolving initial Residual Graph Convolutional Network (DRGCN)
Our experimental results show that our model effectively relieves the problem of over-smoothing in deep GCNs.
Our model reaches new SOTA results on the large-scale ogbn-arxiv dataset of Open Graph Benchmark (OGB)
arXiv Detail & Related papers (2023-02-10T06:57:12Z) - Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum.
Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels.
They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z) - Transfer Learning via Test-Time Neural Networks Aggregation [11.42582922543676]
It has been demonstrated that deep neural networks outperform traditional machine learning.
Deep networks lack generalisability, that is, they will not perform as good as in a new (testing) set drawn from a different distribution.
arXiv Detail & Related papers (2022-06-27T15:46:05Z) - a novel attention-based network for fast salient object detection [14.246237737452105]
In the current salient object detection network, the most popular method is using U-shape structure.
We propose a new deep convolution network architecture with three contributions.
Results demonstrate that the proposed method can compress the model to 1/3 of the original size nearly without losing the accuracy.
arXiv Detail & Related papers (2021-12-20T12:30:20Z) - An Experimental Study of the Impact of Pre-training on the Pruning of a
Convolutional Neural Network [0.0]
In recent years, deep neural networks have known a wide success in various application domains.
Deep neural networks usually involve a large number of parameters, which correspond to the weights of the network.
The pruning methods notably attempt to reduce the size of the parameter set, by identifying and removing the irrelevant weights.
arXiv Detail & Related papers (2021-12-15T16:02:15Z) - Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one.
Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.
We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z) - Learning across label confidence distributions using Filtered Transfer
Learning [0.44040106718326594]
We propose a transfer learning approach to improve predictive power in noisy data systems with large variable confidence datasets.
We propose a deep neural network method called Filtered Transfer Learning (FTL) that defines multiple tiers of data confidence as separate tasks.
We demonstrate that using FTL to learn stepwise, across the label confidence distribution, results in higher performance compared to deep neural network models trained on a single confidence range.
arXiv Detail & Related papers (2020-06-03T21:00:11Z) - OSLNet: Deep Small-Sample Classification with an Orthogonal Softmax
Layer [77.90012156266324]
This paper aims to find a subspace of neural networks that can facilitate a large decision margin.
We propose the Orthogonal Softmax Layer (OSL), which makes the weight vectors in the classification layer remain during both the training and test processes.
Experimental results demonstrate that the proposed OSL has better performance than the methods used for comparison on four small-sample benchmark datasets.
arXiv Detail & Related papers (2020-04-20T02:41:01Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.