DoT: An efficient Double Transformer for NLP tasks with tables
- URL: http://arxiv.org/abs/2106.00479v1
- Date: Tue, 1 Jun 2021 13:33:53 GMT
- Title: DoT: An efficient Double Transformer for NLP tasks with tables
- Authors: Syrine Krichene, Thomas M\"uller and Julian Martin Eisenschlos
- Abstract summary: DoT is a double transformer model that decomposes the problem into two sub-tasks.
We show that for a small drop of accuracy, DoT improves training and inference time by at least 50%.
- Score: 3.0079490585515343
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer-based approaches have been successfully used to obtain
state-of-the-art accuracy on natural language processing (NLP) tasks with
semi-structured tables. These model architectures are typically deep, resulting
in slow training and inference, especially for long inputs. To improve
efficiency while maintaining a high accuracy, we propose a new architecture,
DoT, a double transformer model, that decomposes the problem into two
sub-tasks: A shallow pruning transformer that selects the top-K tokens,
followed by a deep task-specific transformer that takes as input those K
tokens. Additionally, we modify the task-specific attention to incorporate the
pruning scores. The two transformers are jointly trained by optimizing the
task-specific loss. We run experiments on three benchmarks, including
entailment and question-answering. We show that for a small drop of accuracy,
DoT improves training and inference time by at least 50%. We also show that the
pruning transformer effectively selects relevant tokens enabling the end-to-end
model to maintain similar accuracy as slower baseline models. Finally, we
analyse the pruning and give some insight into its impact on the task model.
Related papers
- A Fast Post-Training Pruning Framework for Transformers [74.59556951906468]
Pruning is an effective way to reduce the huge inference cost of large Transformer models.
Prior work on model pruning requires retraining the model.
We propose a fast post-training pruning framework for Transformers that does not require any retraining.
arXiv Detail & Related papers (2022-03-29T07:41:11Z) - Learned Token Pruning for Transformers [39.181816379061374]
Learned Token Pruning () method reduces redundant tokens as the data passes through the different layers of a transformer.
We extensively test the performance of our approach on multiple GLUE tasks.
Preliminary results show up to 1.4x and 1.9x throughput improvement on Tesla T4 and Intel Haswell.
arXiv Detail & Related papers (2021-07-02T09:00:13Z) - Efficient pre-training objectives for Transformers [84.64393460397471]
We study several efficient pre-training objectives for Transformers-based models.
We prove that eliminating the MASK token and considering the whole output during the loss are essential choices to improve performance.
arXiv Detail & Related papers (2021-04-20T00:09:37Z) - Finetuning Pretrained Transformers into RNNs [81.72974646901136]
Transformers have outperformed recurrent neural networks (RNNs) in natural language generation.
A linear-complexity recurrent variant has proven well suited for autoregressive generation.
This work aims to convert a pretrained transformer into its efficient recurrent counterpart.
arXiv Detail & Related papers (2021-03-24T10:50:43Z) - Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime
with Search [84.94597821711808]
We extend PoWER-BERT (Goyal et al., 2020) and propose Length-Adaptive Transformer that can be used for various inference scenarios after one-shot training.
We conduct a multi-objective evolutionary search to find a length configuration that maximizes the accuracy and minimizes the efficiency metric under any given computational budget.
We empirically verify the utility of the proposed approach by demonstrating the superior accuracy-efficiency trade-off under various setups.
arXiv Detail & Related papers (2020-10-14T12:28:08Z) - AxFormer: Accuracy-driven Approximation of Transformers for Faster,
Smaller and more Accurate NLP Models [4.247712017691596]
AxFormer is a framework that applies accuracy-driven approximations to create optimized transformer models for a given downstream task.
Our experiments show that AxFormer models are up to 4.5% more accurate, while also being up to 2.5X faster and up to 3.2X smaller than conventional fine-tuned models.
arXiv Detail & Related papers (2020-10-07T23:29:34Z) - Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks.
We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z) - The Cascade Transformer: an Application for Efficient Answer Sentence
Selection [116.09532365093659]
We introduce the Cascade Transformer, a technique to adapt transformer-based models into a cascade of rankers.
When compared to a state-of-the-art transformer model, our approach reduces computation by 37% with almost no impact on accuracy.
arXiv Detail & Related papers (2020-05-05T23:32:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.