AntiDote: Attention-based Dynamic Optimization for Neural Network
Runtime Efficiency
- URL: http://arxiv.org/abs/2008.06543v1
- Date: Fri, 14 Aug 2020 18:48:13 GMT
- Title: AntiDote: Attention-based Dynamic Optimization for Neural Network
Runtime Efficiency
- Authors: Fuxun Yu, Chenchen Liu, Di Wang, Yanzhi Wang, Xiang Chen
- Abstract summary: We propose a dynamic CNN optimization framework based on the neural network attention mechanism.
Our method could bring 37.4% to 54.5% FLOPs reduction with negligible accuracy drop on various test networks.
- Score: 42.00372941618975
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional Neural Networks (CNNs) achieved great cognitive performance at
the expense of considerable computation load. To relieve the computation load,
many optimization works are developed to reduce the model redundancy by
identifying and removing insignificant model components, such as weight
sparsity and filter pruning. However, these works only evaluate model
components' static significance with internal parameter information, ignoring
their dynamic interaction with external inputs. With per-input feature
activation, the model component significance can dynamically change, and thus
the static methods can only achieve sub-optimal results. Therefore, we propose
a dynamic CNN optimization framework in this work. Based on the neural network
attention mechanism, we propose a comprehensive dynamic optimization framework
including (1) testing-phase channel and column feature map pruning, as well as
(2) training-phase optimization by targeted dropout. Such a dynamic
optimization framework has several benefits: (1) First, it can accurately
identify and aggressively remove per-input feature redundancy with considering
the model-input interaction; (2) Meanwhile, it can maximally remove the feature
map redundancy in various dimensions thanks to the multi-dimension flexibility;
(3) The training-testing co-optimization favors the dynamic pruning and helps
maintain the model accuracy even with very high feature pruning ratio.
Extensive experiments show that our method could bring 37.4% to 54.5% FLOPs
reduction with negligible accuracy drop on various of test networks.
Related papers
- Optimizing Deep Neural Networks using Safety-Guided Self Compression [0.0]
This study introduces a novel safety-driven quantization framework that prunes and quantizes neural network weights.
The proposed methodology is rigorously evaluated on both a convolutional neural network (CNN) and an attention-based language model.
Experimental results reveal that our framework achieves up to a 2.5% enhancement in test accuracy relative to the original unquantized models.
arXiv Detail & Related papers (2025-05-01T06:50:30Z) - Bayesian Optimization of a Lightweight and Accurate Neural Network for Aerodynamic Performance Prediction [0.0]
We propose a new approach to build efficient and accurate predictive models for aerodynamic performance prediction.
To clearly describe the interplay between design variables, hierarchical and categorical kernels are used in the BO formulation.
For the drag coefficient prediction task, the Mean Absolute Percentage Error (MAPE) of our optimized model drops from 0.1433% to 0.0163%.
Our model achieves a MAPE of 0.82% on a benchmark aircraft self-noise prediction problem, significantly outperforming existing models.
arXiv Detail & Related papers (2025-03-25T09:14:36Z) - Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Model-Based Control with Sparse Neural Dynamics [23.961218902837807]
We propose a new framework for integrated model learning and predictive control.
We show that our framework can deliver better closed-loop performance than existing state-of-the-art methods.
arXiv Detail & Related papers (2023-12-20T06:25:02Z) - Dynamically configured physics-informed neural network in topology
optimization applications [4.403140515138818]
The physics-informed neural network (PINN) can avoid generating enormous amounts of data when solving forward problems.
A dynamically configured PINN-based topology optimization (DCPINN-TO) method is proposed.
The accuracy of the displacement prediction and optimization results indicate that the DCPINN-TO method is effective and efficient.
arXiv Detail & Related papers (2023-12-12T05:35:30Z) - Explicit Foundation Model Optimization with Self-Attentive Feed-Forward
Neural Units [4.807347156077897]
Iterative approximation methods using backpropagation enable the optimization of neural networks, but they remain computationally expensive when used at scale.
This paper presents an efficient alternative for optimizing neural networks that reduces the costs of scaling neural networks and provides high-efficiency optimizations for low-resource applications.
arXiv Detail & Related papers (2023-11-13T17:55:07Z) - Towards Hyperparameter-Agnostic DNN Training via Dynamical System
Insights [4.513581513983453]
We present a first-order optimization method specialized for deep neural networks (DNNs), ECCO-DNN.
This method models the optimization variable trajectory as a dynamical system and develops a discretization algorithm that adaptively selects step sizes based on the trajectory's shape.
arXiv Detail & Related papers (2023-10-21T03:45:13Z) - Latency-aware Unified Dynamic Networks for Efficient Image Recognition [72.8951331472913]
LAUDNet is a framework to bridge the theoretical and practical efficiency gap in dynamic networks.
It integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping.
It can notably reduce the latency of models like ResNet by over 50% on platforms such as V100,3090, and TX2 GPUs.
arXiv Detail & Related papers (2023-08-30T10:57:41Z) - Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation [49.44309457870649]
We present Layer-wise Feedback Propagation (LFP), a novel training principle for neural network-like predictors.
LFP decomposes a reward to individual neurons based on their respective contributions to solving a given task.
Our method then implements a greedy approach reinforcing helpful parts of the network and weakening harmful ones.
arXiv Detail & Related papers (2023-08-23T10:48:28Z) - Rate Distortion Characteristic Modeling for Neural Image Compression [59.25700168404325]
End-to-end optimization capability offers neural image compression (NIC) superior lossy compression performance.
distinct models are required to be trained to reach different points in the rate-distortion (R-D) space.
We make efforts to formulate the essential mathematical functions to describe the R-D behavior of NIC using deep network and statistical modeling.
arXiv Detail & Related papers (2021-06-24T12:23:05Z) - CNNPruner: Pruning Convolutional Neural Networks with Visual Analytics [13.38218193857018]
Convolutional neural networks (CNNs) have demonstrated extraordinarily good performance in many computer vision tasks.
CNNPruner allows users to interactively create pruning plans according to a desired goal on model size or accuracy.
arXiv Detail & Related papers (2020-09-08T02:08:20Z) - Highly Efficient Salient Object Detection with 100K Parameters [137.74898755102387]
We propose a flexible convolutional module, namely generalized OctConv (gOctConv), to efficiently utilize both in-stage and cross-stages multi-scale features.
We build an extremely light-weighted model, namely CSNet, which achieves comparable performance with about 0.2% (100k) of large models on popular object detection benchmarks.
arXiv Detail & Related papers (2020-03-12T07:00:46Z) - Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by
Enabling Input-Adaptive Inference [119.19779637025444]
Deep networks were recently suggested to face the odds between accuracy (on clean natural images) and robustness (on adversarially perturbed images)
This paper studies multi-exit networks associated with input-adaptive inference, showing their strong promise in achieving a "sweet point" in cooptimizing model accuracy, robustness and efficiency.
arXiv Detail & Related papers (2020-02-24T00:40:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.