Targeted Gradient Descent: A Novel Method for Convolutional Neural
Networks Fine-tuning and Online-learning
- URL: http://arxiv.org/abs/2109.14729v1
- Date: Wed, 29 Sep 2021 21:22:09 GMT
- Title: Targeted Gradient Descent: A Novel Method for Convolutional Neural
Networks Fine-tuning and Online-learning
- Authors: Junyu Chen, Evren Asma, and Chung Chan
- Abstract summary: A convolutional neural network (ConvNet) is usually trained and then tested using images drawn from the same distribution.
To generalize a ConvNet to various tasks often requires a complete training dataset that consists of images drawn from different tasks.
We present Targeted Gradient Descent (TGD), a novel fine-tuning method that can extend a pre-trained network to a new task without revisiting data from the previous task.
- Score: 9.011106198253053
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A convolutional neural network (ConvNet) is usually trained and then tested
using images drawn from the same distribution. To generalize a ConvNet to
various tasks often requires a complete training dataset that consists of
images drawn from different tasks. In most scenarios, it is nearly impossible
to collect every possible representative dataset as a priori. The new data may
only become available after the ConvNet is deployed in clinical practice.
ConvNet, however, may generate artifacts on out-of-distribution testing
samples. In this study, we present Targeted Gradient Descent (TGD), a novel
fine-tuning method that can extend a pre-trained network to a new task without
revisiting data from the previous task while preserving the knowledge acquired
from previous training. To a further extent, the proposed method also enables
online learning of patient-specific data. The method is built on the idea of
reusing a pre-trained ConvNet's redundant kernels to learn new knowledge. We
compare the performance of TGD to several commonly used training approaches on
the task of Positron emission tomography (PET) image denoising. Results from
clinical images show that TGD generated results on par with
training-from-scratch while significantly reducing data preparation and network
training time. More importantly, it enables online learning on the testing
study to enhance the network's generalization capability in real-world
applications.
Related papers
- Diffused Redundancy in Pre-trained Representations [98.55546694886819]
We take a closer look at how features are encoded in pre-trained representations.
We find that learned representations in a given layer exhibit a degree of diffuse redundancy.
Our findings shed light on the nature of representations learned by pre-trained deep neural networks.
arXiv Detail & Related papers (2023-05-31T21:00:50Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Reconstructing Training Data from Trained Neural Networks [42.60217236418818]
We show in some cases a significant fraction of the training data can in fact be reconstructed from the parameters of a trained neural network classifier.
We propose a novel reconstruction scheme that stems from recent theoretical results about the implicit bias in training neural networks with gradient-based methods.
arXiv Detail & Related papers (2022-06-15T18:35:16Z) - Neural Maximum A Posteriori Estimation on Unpaired Data for Motion
Deblurring [87.97330195531029]
We propose a Neural Maximum A Posteriori (NeurMAP) estimation framework for training neural networks to recover blind motion information and sharp content from unpaired data.
The proposed NeurMAP is an approach to existing deblurring neural networks, and is the first framework that enables training image deblurring networks on unpaired datasets.
arXiv Detail & Related papers (2022-04-26T08:09:47Z) - Neural Capacitance: A New Perspective of Neural Network Selection via
Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction.
We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training.
Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z) - Explaining Deep Learning Representations by Tracing the Training Process [10.774699463547439]
We propose a novel explanation method that explains the decisions of a deep neural network.
We investigate how the intermediate representations at each layer of the deep network were refined during the training process.
We show that our method identifies highly representative training instances that can be used as an explanation.
arXiv Detail & Related papers (2021-09-13T11:29:04Z) - Training Graph Neural Networks by Graphon Estimation [2.5997274006052544]
We propose to train a graph neural network via resampling from a graphon estimate obtained from the underlying network data.
We show that our approach is competitive with and in many cases outperform the other over-smoothing reducing GNN training methods.
arXiv Detail & Related papers (2021-09-04T19:21:48Z) - Dataset Meta-Learning from Kernel Ridge-Regression [18.253682891579402]
Kernel Inducing Points (KIP) can compress datasets by one or two orders of magnitude.
KIP-learned datasets are transferable to the training of finite-width neural networks even beyond the lazy-training regime.
arXiv Detail & Related papers (2020-10-30T18:54:04Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - Curriculum By Smoothing [52.08553521577014]
Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation.
We propose an elegant curriculum based scheme that smoothes the feature embedding of a CNN using anti-aliasing or low-pass filters.
As the amount of information in the feature maps increases during training, the network is able to progressively learn better representations of the data.
arXiv Detail & Related papers (2020-03-03T07:27:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.