Deep Neural Network Models Trained With A Fixed Random Classifier
Transfer Better Across Domains
- URL: http://arxiv.org/abs/2402.18614v1
- Date: Wed, 28 Feb 2024 15:52:30 GMT
- Title: Deep Neural Network Models Trained With A Fixed Random Classifier
Transfer Better Across Domains
- Authors: Hafiz Tiomoko Ali, Umberto Michieli, Ji Joong Moon, Daehyun Kim, Mete
Ozay
- Abstract summary: Recently discovered Neural collapse (NC) phenomenon states that the last-layer weights of Deep Neural Networks converge to the so-called Equiangular Tight Frame (ETF) simplex, at the terminal phase of their training.
Inspired by NC properties, we explore in this paper the transferability of DNN models trained with their last layer weight fixed according to ETF.
- Score: 23.10912424714101
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The recently discovered Neural collapse (NC) phenomenon states that the
last-layer weights of Deep Neural Networks (DNN), converge to the so-called
Equiangular Tight Frame (ETF) simplex, at the terminal phase of their training.
This ETF geometry is equivalent to vanishing within-class variability of the
last layer activations. Inspired by NC properties, we explore in this paper the
transferability of DNN models trained with their last layer weight fixed
according to ETF. This enforces class separation by eliminating class
covariance information, effectively providing implicit regularization. We show
that DNN models trained with such a fixed classifier significantly improve
transfer performance, particularly on out-of-domain datasets. On a broad range
of fine-grained image classification datasets, our approach outperforms i)
baseline methods that do not perform any covariance regularization (up to 22%),
as well as ii) methods that explicitly whiten covariance of activations
throughout training (up to 19%). Our findings suggest that DNNs trained with
fixed ETF classifiers offer a powerful mechanism for improving transfer
learning across domains.
Related papers
- DCNN: Dual Cross-current Neural Networks Realized Using An Interactive Deep Learning Discriminator for Fine-grained Objects [48.65846477275723]
This study proposes novel dual-current neural networks (DCNN) to improve the accuracy of fine-grained image classification.
The main novel design features for constructing a weakly supervised learning backbone model DCNN include (a) extracting heterogeneous data, (b) keeping the feature map resolution unchanged, (c) expanding the receptive field, and (d) fusing global representations and local features.
arXiv Detail & Related papers (2024-05-07T07:51:28Z) - A Gradient Boosting Approach for Training Convolutional and Deep Neural
Networks [0.0]
We introduce two procedures for training Convolutional Neural Networks (CNNs) and Deep Neural Network based on Gradient Boosting (GB)
The presented models show superior performance in terms of classification accuracy with respect to standard CNN and Deep-NN with the same architectures.
arXiv Detail & Related papers (2023-02-22T12:17:32Z) - On the effectiveness of partial variance reduction in federated learning
with heterogeneous data [27.527995694042506]
We show that the diversity of the final classification layers across clients impedes the performance of the FedAvg algorithm.
Motivated by this, we propose to correct model by variance reduction only on the final layers.
We demonstrate that this significantly outperforms existing benchmarks at a similar or lower communication cost.
arXiv Detail & Related papers (2022-12-05T11:56:35Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF)
We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF.
We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z) - Do We Really Need a Learnable Classifier at the End of Deep Neural
Network? [118.18554882199676]
We study the potential of learning a neural network for classification with the classifier randomly as an ETF and fixed during training.
Our experimental results show that our method is able to achieve similar performances on image classification for balanced datasets.
arXiv Detail & Related papers (2022-03-17T04:34:28Z) - KNN-BERT: Fine-Tuning Pre-Trained Models with KNN Classifier [61.063988689601416]
Pre-trained models are widely used in fine-tuning downstream tasks with linear classifiers optimized by the cross-entropy loss.
These problems can be improved by learning representations that focus on similarities in the same class and contradictions when making predictions.
We introduce the KNearest Neighbors in pre-trained model fine-tuning tasks in this paper.
arXiv Detail & Related papers (2021-10-06T06:17:05Z) - Boosting the Generalization Capability in Cross-Domain Few-shot Learning
via Noise-enhanced Supervised Autoencoder [23.860842627883187]
We teach the model to capture broader variations of the feature distributions with a novel noise-enhanced supervised autoencoder (NSAE)
NSAE trains the model by jointly reconstructing inputs and predicting the labels of inputs as well as their reconstructed pairs.
We also take advantage of NSAE structure and propose a two-step fine-tuning procedure that achieves better adaption and improves classification performance in the target domain.
arXiv Detail & Related papers (2021-08-11T04:45:56Z) - A Transductive Multi-Head Model for Cross-Domain Few-Shot Learning [72.30054522048553]
We present a new method, Transductive Multi-Head Few-Shot learning (TMHFS), to address the Cross-Domain Few-Shot Learning challenge.
The proposed methods greatly outperform the strong baseline, fine-tuning, on four different target domains.
arXiv Detail & Related papers (2020-06-08T02:39:59Z) - Neuroevolutionary Transfer Learning of Deep Recurrent Neural Networks
through Network-Aware Adaptation [57.46377517266827]
This work introduces network-aware adaptive structure transfer learning (N-ASTL)
N-ASTL utilizes statistical information related to the source network's topology and weight distribution to inform how new input and output neurons are to be integrated into the existing structure.
Results show improvements over prior state-of-the-art, including the ability to transfer in challenging real-world datasets not previously possible.
arXiv Detail & Related papers (2020-06-04T06:07:30Z) - One Versus all for deep Neural Network Incertitude (OVNNI)
quantification [12.734278426543332]
We propose a new technique to quantify the epistemic uncertainty of data easily.
This method consists in mixing the predictions of an ensemble of DNNs trained to classify One class vs All the other classes (OVA) with predictions from a standard DNN trained to perform All vs All (AVA) classification.
arXiv Detail & Related papers (2020-06-01T14:06:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.