Multi-stage feature decorrelation constraints for improving CNN
classification performance
- URL: http://arxiv.org/abs/2308.12880v2
- Date: Fri, 29 Dec 2023 15:42:05 GMT
- Title: Multi-stage feature decorrelation constraints for improving CNN
classification performance
- Authors: Qiuyu Zhu and Hao Wang and Xuewen Zu and Chengfei Liu
- Abstract summary: This article proposes a multi-stage feature decorrelation loss (MFD Loss) for CNN.
MFD Loss refines effective features and eliminates information redundancy by constraining the correlation of features at all stages.
Compared with the single Softmax Loss supervised learning, the experiments on several commonly used datasets on several typical CNNs prove that the classification performance of Softmax Loss+MFD Loss is significantly better.
- Score: 14.09469656684143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For the convolutional neural network (CNN) used for pattern classification,
the training loss function is usually applied to the final output of the
network, except for some regularization constraints on the network parameters.
However, with the increasing of the number of network layers, the influence of
the loss function on the network front layers gradually decreases, and the
network parameters tend to fall into local optimization. At the same time, it
is found that the trained network has significant information redundancy at all
stages of features, which reduces the effectiveness of feature mapping at all
stages and is not conducive to the change of the subsequent parameters of the
network in the direction of optimality. Therefore, it is possible to obtain a
more optimized solution of the network and further improve the classification
accuracy of the network by designing a loss function for restraining the front
stage features and eliminating the information redundancy of the front stage
features .For CNN, this article proposes a multi-stage feature decorrelation
loss (MFD Loss), which refines effective features and eliminates information
redundancy by constraining the correlation of features at all stages.
Considering that there are many layers in CNN, through experimental comparison
and analysis, MFD Loss acts on multiple front layers of CNN, constrains the
output features of each layer and each channel, and performs supervision
training jointly with classification loss function during network training.
Compared with the single Softmax Loss supervised learning, the experiments on
several commonly used datasets on several typical CNNs prove that the
classification performance of Softmax Loss+MFD Loss is significantly better.
Meanwhile, the comparison experiments before and after the combination of MFD
Loss and some other typical loss functions verify its good universality.
Related papers
- Multi-channel Time Series Decomposition Network For Generalizable Sensor-Based Activity Recognition [2.024925013349319]
This paper proposes a new method, Multi-channel Time Series Decomposition Network (MTSDNet)
It decomposes the original signal into a combination of multiple components and trigonometric functions by the trainable parameterized temporal decomposition.
It shows the advantages in predicting accuracy and stability of our method compared with other competing strategies.
arXiv Detail & Related papers (2024-03-28T12:54:06Z) - A Systematic Performance Analysis of Deep Perceptual Loss Networks: Breaking Transfer Learning Conventions [5.470136744581653]
Deep perceptual loss is a type of loss function for images that computes the error between two images as the distance between deep features extracted from a neural network.
This work evaluates the effect of different pretrained loss networks on four different application areas.
arXiv Detail & Related papers (2023-02-08T13:08:51Z) - Are All Losses Created Equal: A Neural Collapse Perspective [36.0354919583995]
Cross entropy (CE) is the most commonly used loss to train deep neural networks for classification tasks.
We show through global solution and landscape analyses that a broad family of loss functions including commonly used label smoothing (LS) and focal loss (FL) exhibits Neural Collapse.
arXiv Detail & Related papers (2022-10-04T00:36:45Z) - Image Superresolution using Scale-Recurrent Dense Network [30.75380029218373]
Recent advances in the design of convolutional neural network (CNN) have yielded significant improvements in the performance of image super-resolution (SR)
We propose a scale recurrent SR architecture built upon units containing series of dense connections within a residual block (Residual Dense Blocks (RDBs))
Our scale recurrent design delivers competitive performance for higher scale factors while being parametrically more efficient as compared to current state-of-the-art approaches.
arXiv Detail & Related papers (2022-01-28T09:18:43Z) - Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow.
We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z) - Why Do Better Loss Functions Lead to Less Transferable Features? [93.47297944685114]
This paper studies how the choice of training objective affects the transferability of the hidden representations of convolutional neural networks trained on ImageNet.
We show that many objectives lead to statistically significant improvements in ImageNet accuracy over vanilla softmax cross-entropy, but the resulting fixed feature extractors transfer substantially worse to downstream tasks.
arXiv Detail & Related papers (2020-10-30T17:50:31Z) - On Robustness and Transferability of Convolutional Neural Networks [147.71743081671508]
Modern deep convolutional networks (CNNs) are often criticized for not generalizing under distributional shifts.
We study the interplay between out-of-distribution and transfer performance of modern image classification CNNs for the first time.
We find that increasing both the training set and model sizes significantly improve the distributional shift robustness.
arXiv Detail & Related papers (2020-07-16T18:39:04Z) - A Transductive Multi-Head Model for Cross-Domain Few-Shot Learning [72.30054522048553]
We present a new method, Transductive Multi-Head Few-Shot learning (TMHFS), to address the Cross-Domain Few-Shot Learning challenge.
The proposed methods greatly outperform the strong baseline, fine-tuning, on four different target domains.
arXiv Detail & Related papers (2020-06-08T02:39:59Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z) - Beyond Dropout: Feature Map Distortion to Regularize Deep Neural
Networks [107.77595511218429]
In this paper, we investigate the empirical Rademacher complexity related to intermediate layers of deep neural networks.
We propose a feature distortion method (Disout) for addressing the aforementioned problem.
The superiority of the proposed feature map distortion for producing deep neural network with higher testing performance is analyzed and demonstrated.
arXiv Detail & Related papers (2020-02-23T13:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.