Effective Pre-Training Objectives for Transformer-based Autoencoders
- URL: http://arxiv.org/abs/2210.13536v1
- Date: Mon, 24 Oct 2022 18:39:44 GMT
- Title: Effective Pre-Training Objectives for Transformer-based Autoencoders
- Authors: Luca Di Liello, Matteo Gabburo, Alessandro Moschitti
- Abstract summary: We study trade-offs between efficiency, cost and accuracy of Transformer encoders.
We combine features of common objectives and create new effective pre-training approaches.
- Score: 97.99741848756302
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this paper, we study trade-offs between efficiency, cost and accuracy when
pre-training Transformer encoders with different pre-training objectives. For
this purpose, we analyze features of common objectives and combine them to
create new effective pre-training approaches. Specifically, we designed light
token generators based on a straightforward statistical approach, which can
replace ELECTRA computationally heavy generators, thus highly reducing cost.
Our experiments also show that (i) there are more efficient alternatives to
BERT's MLM, and (ii) it is possible to efficiently pre-train Transformer-based
models using lighter generators without a significant drop in performance.
Related papers
- TranDRL: A Transformer-Driven Deep Reinforcement Learning Enabled Prescriptive Maintenance Framework [58.474610046294856]
Industrial systems demand reliable predictive maintenance strategies to enhance operational efficiency and reduce downtime.
This paper introduces an integrated framework that leverages the capabilities of the Transformer model-based neural networks and deep reinforcement learning (DRL) algorithms to optimize system maintenance actions.
arXiv Detail & Related papers (2023-09-29T02:27:54Z) - Efficient Training for Visual Tracking with Deformable Transformer [0.0]
We present DETRack, a streamlined end-to-end visual object tracking framework.
Our framework utilizes an efficient encoder-decoder structure where the deformable transformer decoder acting as a target head.
For training, we introduce a novel one-to-many label assignment and an auxiliary denoising technique.
arXiv Detail & Related papers (2023-09-06T03:07:43Z) - Efficient Bayesian Optimization with Deep Kernel Learning and
Transformer Pre-trained on Multiple Heterogeneous Datasets [9.510327380529892]
We propose a simple approach to pre-train a surrogate, which is a Gaussian process (GP) with a kernel defined on deep features learned from a Transformer-based encoder.
Experiments on both synthetic and real benchmark problems demonstrate the effectiveness of our proposed pre-training and transfer BO strategy.
arXiv Detail & Related papers (2023-08-09T01:56:10Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - Energy-Efficient Adaptive Machine Learning on IoT End-Nodes With
Class-Dependent Confidence [22.225875583595027]
An effective way to obtain energy-efficiency with small accuracy drops is to sequentially execute a set of increasingly complex models.
Current methods employ a single threshold on the output probabilities produced by each model.
We show that our method can significantly reduce the energy consumption compared to the single-threshold approach.
arXiv Detail & Related papers (2022-04-07T13:22:52Z) - Learning to Sample Replacements for ELECTRA Pre-Training [40.17248997321726]
ELECTRA pretrains a discriminator to detect replaced tokens, where the replacements are sampled from a generator trained with masked language modeling.
Despite the compelling performance, ELECTRA suffers from the following two issues.
We propose two methods to improve replacement sampling for ELECTRA pre-training.
arXiv Detail & Related papers (2021-06-25T15:51:55Z) - Training ELECTRA Augmented with Multi-word Selection [53.77046731238381]
We present a new text encoder pre-training method that improves ELECTRA based on multi-task learning.
Specifically, we train the discriminator to simultaneously detect replaced tokens and select original tokens from candidate sets.
arXiv Detail & Related papers (2021-05-31T23:19:00Z) - Efficient pre-training objectives for Transformers [84.64393460397471]
We study several efficient pre-training objectives for Transformers-based models.
We prove that eliminating the MASK token and considering the whole output during the loss are essential choices to improve performance.
arXiv Detail & Related papers (2021-04-20T00:09:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.