Related papers: Deep Feature Surgery: Towards Accurate and Efficient Multi-Exit Networks

Deep Feature Surgery: Towards Accurate and Efficient Multi-Exit Networks

URL: http://arxiv.org/abs/2407.13986v2
Date: Fri, 9 Aug 2024 23:34:14 GMT
Title: Deep Feature Surgery: Towards Accurate and Efficient Multi-Exit Networks
Authors: Cheng Gong, Yao Chen, Qiuyang Luo, Ye Lu, Tao Li, Yuzhi Zhang, Yufei Sun, Le Zhang,
Abstract summary: This paper introduces Deep Feature Surgery (methodname) to resolve gradient conflict issues during the training of multi-exit networks. methodnamereduces the training operations with the reduced complexity of backpropagation. Budgeted batch classification evaluation shows that DFS uses about $symbolmathbf2boldtimes$ fewer average FLOPs per image to achieve the same classification accuracy as baseline methods on Cifar100.
Score: 13.492494712188922
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-exit network is a promising architecture for efficient model inference by sharing backbone networks and weights among multiple exits. However, the gradient conflict of the shared weights results in sub-optimal accuracy. This paper introduces Deep Feature Surgery (\methodname), which consists of feature partitioning and feature referencing approaches to resolve gradient conflict issues during the training of multi-exit networks. The feature partitioning separates shared features along the depth axis among all exits to alleviate gradient conflict while simultaneously promoting joint optimization for each exit. Subsequently, feature referencing enhances multi-scale features for distinct exits across varying depths to improve the model accuracy. Furthermore, \methodname~reduces the training operations with the reduced complexity of backpropagation. Experimental results on Cifar100 and ImageNet datasets exhibit that \methodname~provides up to a \textbf{50.00\%} reduction in training time and attains up to a \textbf{6.94\%} enhancement in accuracy when contrasted with baseline methods across diverse models and tasks. Budgeted batch classification evaluation on MSDNet demonstrates that DFS uses about $\mathbf{2}\boldsymbol{\times}$ fewer average FLOPs per image to achieve the same classification accuracy as baseline methods on Cifar100. The code is available at https://github.com/GongCheng1919/dfs.

Related papers

Deep Neural Networks Fused with Textures for Image Classification [20.58839604333332]
Fine-grained image classification is a challenging task in computer vision. We propose a fusion approach to address FGIC by combining global texture with local patch-based information. Our method has attained better classification accuracy over existing methods with notable margins.
arXiv Detail & Related papers (2023-08-03T15:21:08Z)
Training trajectories, mini-batch losses and the curious role of the learning rate [13.848916053916618]
We show that validated gradient descent plays a fundamental role in nearly all applications of deep learning. We propose a simple model and a geometric interpretation that allows to analyze the relationship between the gradients of mini-batches and the full batch. In particular, a very low loss value can be reached just one step of descent with large enough learning rate.
arXiv Detail & Related papers (2023-01-05T21:58:46Z)
Deep Negative Correlation Classification [82.45045814842595]
Existing deep ensemble methods naively train many different models and then aggregate their predictions. We propose deep negative correlation classification (DNCC) DNCC yields a deep classification ensemble where the individual estimator is both accurate and negatively correlated.
arXiv Detail & Related papers (2022-12-14T07:35:20Z)
Network Pruning via Feature Shift Minimization [8.593369249204132]
We propose a novel Feature Shift Minimization (FSM) method to compress CNN models, which evaluates the feature shift by converging the information of both features and filters. The proposed method yields state-of-the-art performance on various benchmark networks and datasets, verified by extensive experiments.
arXiv Detail & Related papers (2022-07-06T12:50:26Z)
C$^{4}$Net: Contextual Compression and Complementary Combination Network for Salient Object Detection [0.0]
We show that feature concatenation works better than other combination methods like multiplication or addition. Also, joint feature learning gives better results, because of the information sharing during their processing.
arXiv Detail & Related papers (2021-10-22T16:14:10Z)
Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks. The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z)
ResLT: Residual Learning for Long-tailed Recognition [64.19728932445523]
We propose a more fundamental perspective for long-tailed recognition, i.e., from the aspect of parameter space. We design the effective residual fusion mechanism -- with one main branch optimized to recognize images from all classes, another two residual branches are gradually fused and optimized to enhance images from medium+tail classes and tail classes respectively. We test our method on several benchmarks, i.e., long-tailed version of CIFAR-10, CIFAR-100, Places, ImageNet, and iNaturalist 2018.
arXiv Detail & Related papers (2021-01-26T08:43:50Z)
Sparse Communication for Training Deep Networks [56.441077560085475]
Synchronous gradient descent (SGD) is the most common method used for distributed training of deep learning models. In this algorithm, each worker shares its local gradients with others and updates the parameters using the average gradients of all workers. We study several compression schemes and identify how three key parameters affect the performance.
arXiv Detail & Related papers (2020-09-19T17:28:11Z)
Sequential Hierarchical Learning with Distribution Transformation for Image Super-Resolution [83.70890515772456]
We build a sequential hierarchical learning super-resolution network (SHSR) for effective image SR. We consider the inter-scale correlations of features, and devise a sequential multi-scale block (SMB) to progressively explore the hierarchical information. Experiment results show SHSR achieves superior quantitative performance and visual quality to state-of-the-art methods.
arXiv Detail & Related papers (2020-07-19T01:35:53Z)
Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels. To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit. Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z)
Attentive CutMix: An Enhanced Data Augmentation Approach for Deep Learning Based Image Classification [58.20132466198622]
We propose Attentive CutMix, a naturally enhanced augmentation strategy based on CutMix. In each training iteration, we choose the most descriptive regions based on the intermediate attention maps from a feature extractor. Our proposed method is simple yet effective, easy to implement and can boost the baseline significantly.
arXiv Detail & Related papers (2020-03-29T15:01:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.