Efficient Sparsely Activated Transformers
- URL: http://arxiv.org/abs/2208.14580v1
- Date: Wed, 31 Aug 2022 00:44:27 GMT
- Title: Efficient Sparsely Activated Transformers
- Authors: Salar Latifi, Saurav Muralidharan, Michael Garland
- Abstract summary: Transformer-based neural networks have achieved state-of-the-art task performance in a number of machine learning domains.
Recent work has explored the integration of dynamic behavior into these networks in the form of mixture-of-expert layers.
We introduce a novel system named PLANER that takes an existing Transformer-based network and a user-defined latency target.
- Score: 0.34410212782758054
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer-based neural networks have achieved state-of-the-art task
performance in a number of machine learning domains including natural language
processing and computer vision. To further improve their accuracy, recent work
has explored the integration of dynamic behavior into these networks in the
form of mixture-of-expert (MoE) layers. In this paper, we explore the
introduction of MoE layers to optimize a different metric: inference latency.
We introduce a novel system named PLANER that takes an existing
Transformer-based network and a user-defined latency target and produces an
optimized, sparsely-activated version of the original network that tries to
meet the latency target while maintaining baseline accuracy. We evaluate PLANER
on two real-world language modeling tasks using the Transformer-XL network and
achieve inference latency reductions of over 2x at iso-accuracy.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Unifying Dimensions: A Linear Adaptive Approach to Lightweight Image Super-Resolution [6.857919231112562]
Window-based transformers have demonstrated outstanding performance in super-resolution tasks.
They exhibit higher computational complexity and inference latency than convolutional neural networks.
We construct a convolution-based Transformer framework named the linear adaptive mixer network (LAMNet)
arXiv Detail & Related papers (2024-09-26T07:24:09Z) - ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection [65.59969454655996]
We propose an efficient change detection framework, ELGC-Net, which leverages rich contextual information to precisely estimate change regions.
Our proposed ELGC-Net sets a new state-of-the-art performance in remote sensing change detection benchmarks.
We also introduce ELGC-Net-LW, a lighter variant with significantly reduced computational complexity, suitable for resource-constrained settings.
arXiv Detail & Related papers (2024-03-26T17:46:25Z) - Device Sampling and Resource Optimization for Federated Learning in Cooperative Edge Networks [17.637761046608]
Federated learning (FedL) distributes machine learning (ML) across worker devices by having them train local models that are periodically aggregated by a server.
FedL ignores two important characteristics of contemporary wireless networks: (i) the network may contain heterogeneous communication/computation resources, and (ii) there may be significant overlaps in devices' local data distributions.
We develop a novel optimization methodology that jointly accounts for these factors via intelligent device sampling complemented by device-to-device (D2D) offloading.
arXiv Detail & Related papers (2023-11-07T21:17:59Z) - Accelerating Deep Neural Networks via Semi-Structured Activation
Sparsity [0.0]
Exploiting sparsity in the network's feature maps is one of the ways to reduce its inference latency.
We propose a solution to induce semi-structured activation sparsity exploitable through minor runtime modifications.
Our approach yields a speed improvement of $1.25 times$ with a minimal accuracy drop of $1.1%$ for the ResNet18 model on the ImageNet dataset.
arXiv Detail & Related papers (2023-09-12T22:28:53Z) - Deformable Mixer Transformer with Gating for Multi-Task Learning of
Dense Prediction [126.34551436845133]
CNNs and Transformers have their own advantages and both have been widely used for dense prediction in multi-task learning (MTL)
We present a novel MTL model by combining both merits of deformable CNN and query-based Transformer with shared gating for multi-task learning of dense prediction.
arXiv Detail & Related papers (2023-08-10T17:37:49Z) - Exploring the Performance and Efficiency of Transformer Models for NLP
on Mobile Devices [3.809702129519641]
New deep neural network (DNN) architectures and approaches are emerging every few years, driving the field's advancement.
Transformers are a relatively new model family that has achieved new levels of accuracy across AI tasks, but poses significant computational challenges.
This work aims to make steps towards bridging this gap by examining the current state of Transformers' on-device execution.
arXiv Detail & Related papers (2023-06-20T10:15:01Z) - Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks.
specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples.
We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z) - Dynamic Slimmable Network [105.74546828182834]
We develop a dynamic network slimming regime named Dynamic Slimmable Network (DS-Net)
Our DS-Net is empowered with the ability of dynamic inference by the proposed double-headed dynamic gate.
It consistently outperforms its static counterparts as well as state-of-the-art static and dynamic model compression methods.
arXiv Detail & Related papers (2021-03-24T15:25:20Z) - Device Sampling for Heterogeneous Federated Learning: Theory,
Algorithms, and Implementation [24.084053136210027]
We develop a sampling methodology based on graph sequential convolutional networks (GCNs)
We find that our methodology while sampling less than 5% of all devices outperforms conventional federated learning (FedL) substantially both in terms of trained model accuracy and required resource utilization.
arXiv Detail & Related papers (2021-01-04T05:59:50Z) - An Image Enhancing Pattern-based Sparsity for Real-time Inference on
Mobile Devices [58.62801151916888]
We introduce a new sparsity dimension, namely pattern-based sparsity that comprises pattern and connectivity sparsity, and becoming both highly accurate and hardware friendly.
Our approach on the new pattern-based sparsity naturally fits into compiler optimization for highly efficient DNN execution on mobile platforms.
arXiv Detail & Related papers (2020-01-20T16:17:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.