Related papers: EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation

EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation

URL: http://arxiv.org/abs/2202.07959v1
Date: Wed, 16 Feb 2022 10:10:00 GMT
Title: EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation
Authors: Tao Ge, Furu Wei
Abstract summary: EdgeFormer is a parameter-efficient Transformer of the encoder-decoder architecture for on-device seq2seq generation. We conduct experiments on two practical on-device seq2seq tasks: Machine Translation and Grammatical Error Correction.
Score: 104.44478403427881
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose EdgeFormer -- a parameter-efficient Transformer of the encoder-decoder architecture for on-device seq2seq generation, which is customized under the strict computation and memory constraints. EdgeFormer proposes two novel principles for cost-effective parameterization and further enhance the model with efficient layer adaptation. We conduct extensive experiments on two practical on-device seq2seq tasks: Machine Translation and Grammatical Error Correction, and show that EdgeFormer can effectively outperform previous parameter-efficient Transformer baselines and achieve very competitive results with knowledge distillation under both the computation and memory constraints.

Related papers

EdgeInfinite: A Memory-Efficient Infinite-Context Transformer for Edge Devices [3.739419555718102]
Transformer-based large language models (LLMs) encounter challenges in processing long sequences on edge devices. We present EdgeInfinite, a memory-efficient solution for infinite contexts that integrates compressed memory into Transformer-based LLMs.
arXiv Detail & Related papers (2025-03-28T07:26:37Z)
Re-Parameterization of Lightweight Transformer for On-Device Speech Emotion Recognition [10.302458835329539]
We introduce a new method, namely Transformer Re- parameterization, to boost the performance of lightweight Transformer models. Experimental results show that our proposed method consistently improves the performance of lightweight Transformers, even making them comparable to large models.
arXiv Detail & Related papers (2024-11-14T10:36:19Z)
Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision [52.80792724919329]
We introduce a novel framework named Adapter-X to improve fine-tuning in 2D image and 3D point cloud modalities. It is the first to outperform full fine-tuning in both 2D image and 3D point cloud modalities with significantly fewer parameters, i.e., only 0.20% and 1.88% of original trainable parameters for 2D and 3D classification tasks.
arXiv Detail & Related papers (2024-06-05T08:26:44Z)
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation. DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z)
Prompt Guided Transformer for Multi-Task Dense Prediction [14.815576352301322]
We introduce a lightweight task-conditional model called Prompt Guided Transformer to optimize performance and model parameters. Our approach achieves state-of-the-art results among task-conditional methods while using fewer parameters, and maintains a significant balance between performance and parameter size.
arXiv Detail & Related papers (2023-07-28T07:25:57Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
Full Stack Optimization of Transformer Inference: a Survey [58.55475772110702]
Transformer models achieve superior accuracy across a wide range of applications. The amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate. There has been an increased focus on making Transformer models more efficient.
arXiv Detail & Related papers (2023-02-27T18:18:13Z)
HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer Compression [69.36555801766762]
We propose a hardware-aware tensor decomposition framework, dubbed HEAT, that enables efficient exploration of the exponential space of possible decompositions. We experimentally show that our hardware-aware factorized BERT variants reduce the energy-delay product by 5.7x with less than 1.1% accuracy loss.
arXiv Detail & Related papers (2022-11-30T05:31:45Z)
Easy and Efficient Transformer : Scalable Inference Solution For large NLP mode [14.321889138798072]
This paper introduces a series of ultra-large-scale pre-training model optimization methods. An inference engine -- Easy and Efficient Transformer (EET) is proposed. EET achieves a 1.5-15x state-of-art speedup varying with context length.
arXiv Detail & Related papers (2021-04-26T11:00:56Z)
Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers [16.88840622945725]
We develop the Subformer, a parameter efficient Transformer-based model. Experiments on machine translation, abstractive summarization, and language modeling show that the Subformer can outperform the Transformer even when using significantly fewer parameters.
arXiv Detail & Related papers (2021-01-01T13:53:22Z)
Fusion-Catalyzed Pruning for Optimizing Deep Learning on Intelligent Edge Devices [9.313154178072049]
We present a novel fusion-parametric pruning approach, called FuPruner, for accelerating neural networks. We introduce an aggressive fusion method to equivalently transform a model, which extends the optimization space of pruning. FuPruner provides optimization options for controlling fusion and pruning, allowing much more flexible performance-accuracy trade-offs to be made.
arXiv Detail & Related papers (2020-10-30T10:10:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.