EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq
Generation
- URL: http://arxiv.org/abs/2202.07959v1
- Date: Wed, 16 Feb 2022 10:10:00 GMT
- Title: EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq
Generation
- Authors: Tao Ge, Furu Wei
- Abstract summary: EdgeFormer is a parameter-efficient Transformer of the encoder-decoder architecture for on-device seq2seq generation.
We conduct experiments on two practical on-device seq2seq tasks: Machine Translation and Grammatical Error Correction.
- Score: 104.44478403427881
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose EdgeFormer -- a parameter-efficient Transformer of the
encoder-decoder architecture for on-device seq2seq generation, which is
customized under the strict computation and memory constraints. EdgeFormer
proposes two novel principles for cost-effective parameterization and further
enhance the model with efficient layer adaptation. We conduct extensive
experiments on two practical on-device seq2seq tasks: Machine Translation and
Grammatical Error Correction, and show that EdgeFormer can effectively
outperform previous parameter-efficient Transformer baselines and achieve very
competitive results with knowledge distillation under both the computation and
memory constraints.
Related papers
- Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision [52.80792724919329]
We introduce a novel framework named Adapter-X to improve fine-tuning in 2D image and 3D point cloud modalities.
It is the first to outperform full fine-tuning in both 2D image and 3D point cloud modalities with significantly fewer parameters, i.e., only 0.20% and 1.88% of original trainable parameters for 2D and 3D classification tasks.
arXiv Detail & Related papers (2024-06-05T08:26:44Z) - Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation.
DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z) - Prompt Guided Transformer for Multi-Task Dense Prediction [14.815576352301322]
We introduce a lightweight task-conditional model called Prompt Guided Transformer to optimize performance and model parameters.
Our approach achieves state-of-the-art results among task-conditional methods while using fewer parameters, and maintains a significant balance between performance and parameter size.
arXiv Detail & Related papers (2023-07-28T07:25:57Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - Full Stack Optimization of Transformer Inference: a Survey [58.55475772110702]
Transformer models achieve superior accuracy across a wide range of applications.
The amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate.
There has been an increased focus on making Transformer models more efficient.
arXiv Detail & Related papers (2023-02-27T18:18:13Z) - HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer
Compression [69.36555801766762]
We propose a hardware-aware tensor decomposition framework, dubbed HEAT, that enables efficient exploration of the exponential space of possible decompositions.
We experimentally show that our hardware-aware factorized BERT variants reduce the energy-delay product by 5.7x with less than 1.1% accuracy loss.
arXiv Detail & Related papers (2022-11-30T05:31:45Z) - THG: Transformer with Hyperbolic Geometry [8.895324519034057]
"X-former" models make changes only around the quadratic time and memory complexity of self-attention.
We propose a novel Transformer with Hyperbolic Geometry (THG) model, which take the advantage of both Euclidean space and Hyperbolic space.
arXiv Detail & Related papers (2021-06-01T14:09:33Z) - Easy and Efficient Transformer : Scalable Inference Solution For large
NLP mode [14.321889138798072]
This paper introduces a series of ultra-large-scale pre-training model optimization methods.
An inference engine -- Easy and Efficient Transformer (EET) is proposed.
EET achieves a 1.5-15x state-of-art speedup varying with context length.
arXiv Detail & Related papers (2021-04-26T11:00:56Z) - Subformer: Exploring Weight Sharing for Parameter Efficiency in
Generative Transformers [16.88840622945725]
We develop the Subformer, a parameter efficient Transformer-based model.
Experiments on machine translation, abstractive summarization, and language modeling show that the Subformer can outperform the Transformer even when using significantly fewer parameters.
arXiv Detail & Related papers (2021-01-01T13:53:22Z) - Fusion-Catalyzed Pruning for Optimizing Deep Learning on Intelligent
Edge Devices [9.313154178072049]
We present a novel fusion-parametric pruning approach, called FuPruner, for accelerating neural networks.
We introduce an aggressive fusion method to equivalently transform a model, which extends the optimization space of pruning.
FuPruner provides optimization options for controlling fusion and pruning, allowing much more flexible performance-accuracy trade-offs to be made.
arXiv Detail & Related papers (2020-10-30T10:10:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.