Fast and Accurate Deep Bidirectional Language Representations for
Unsupervised Learning
- URL: http://arxiv.org/abs/2004.08097v1
- Date: Fri, 17 Apr 2020 07:43:38 GMT
- Title: Fast and Accurate Deep Bidirectional Language Representations for
Unsupervised Learning
- Authors: Joongbo Shin, Yoonhyung Lee, Seunghyun Yoon, Kyomin Jung
- Abstract summary: We propose a novel deep bidirectional language model called Transformer-based Text Autoencoder (T-TA)
The T-TA computes contextual language representations without repetition and has benefits of the deep bidirectional architecture like BERT.
In run-time experiments on CPU environments, the proposed T-TA performs over six times faster than the BERT-based model in the reranking task and twelve times faster in the semantic similarity task.
- Score: 31.897630023454067
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Even though BERT achieves successful performance improvements in various
supervised learning tasks, applying BERT for unsupervised tasks still holds a
limitation that it requires repetitive inference for computing contextual
language representations. To resolve the limitation, we propose a novel deep
bidirectional language model called Transformer-based Text Autoencoder (T-TA).
The T-TA computes contextual language representations without repetition and
has benefits of the deep bidirectional architecture like BERT. In run-time
experiments on CPU environments, the proposed T-TA performs over six times
faster than the BERT-based model in the reranking task and twelve times faster
in the semantic similarity task. Furthermore, the T-TA shows competitive or
even better accuracies than those of BERT on the above tasks.
Related papers
- TransPrompt v2: A Transferable Prompting Framework for Cross-task Text
Classification [37.824031151922604]
We propose TransPrompt v2, a novel transferable prompting framework for few-shot learning across similar or distant text classification tasks.
For learning across similar tasks, we employ a multi-task meta-knowledge acquisition (MMA) procedure to train a meta-learner.
For learning across distant tasks, we inject the task type descriptions into the prompt, and capture the intra-type and inter-type prompt embeddings.
arXiv Detail & Related papers (2023-08-29T04:16:57Z) - Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study [68.75670223005716]
We find that pre-trained language models like BERT have a potential ability to learn sequentially, even without any sparse memory replay.
Our experiments reveal that BERT can actually generate high quality representations for previously learned tasks in a long term, under extremely sparse replay or even no replay.
arXiv Detail & Related papers (2023-03-02T09:03:43Z) - XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems
to Improve Language Understanding [73.24847320536813]
This study explores distilling visual information from pretrained multimodal transformers to pretrained language encoders.
Our framework is inspired by cross-modal encoders' success in visual-language tasks while we alter the learning objective to cater to the language-heavy characteristics of NLU.
arXiv Detail & Related papers (2022-04-15T03:44:00Z) - PromptBERT: Improving BERT Sentence Embeddings with Prompts [95.45347849834765]
We propose a prompt based sentence embeddings method which can reduce token embeddings biases and make the original BERT layers more effective.
We also propose a novel unsupervised training objective by the technology of template denoising, which substantially shortens the performance gap between the supervised and unsupervised setting.
Our fine-tuned method outperforms the state-of-the-art method SimCSE in both unsupervised and supervised settings.
arXiv Detail & Related papers (2022-01-12T06:54:21Z) - A text autoencoder from transformer for fast encoding language
representation [0.0]
We propose a deep bidirectional language model by using window masking mechanism at attention layer.
This work computes contextual language representations without random masking as does in BERT.
Our method shows O(n) complexity less compared to other transformer-based models with O($n2$)
arXiv Detail & Related papers (2021-11-04T13:09:10Z) - Weighted Training for Cross-Task Learning [71.94908559469475]
We introduce Target-Aware Weighted Training (TAWT), a weighted training algorithm for cross-task learning.
We show that TAWT is easy to implement, is computationally efficient, requires little hyper parameter tuning, and enjoys non-asymptotic learning-theoretic guarantees.
As a byproduct, the proposed representation-based task distance allows one to reason in a theoretically principled way about several critical aspects of cross-task learning.
arXiv Detail & Related papers (2021-05-28T20:27:02Z) - ConSERT: A Contrastive Framework for Self-Supervised Sentence
Representation Transfer [19.643512923368743]
We present ConSERT, a Contrastive Framework for Self-Supervised Sentence Representation Transfer.
By making use of unlabeled texts, ConSERT solves the collapse issue of BERT-derived sentence representations.
Experiments on STS datasets demonstrate that ConSERT achieves an 8% relative improvement over the previous state-of-the-art.
arXiv Detail & Related papers (2021-05-25T08:15:01Z) - TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference [54.791572981834435]
Existing pre-trained language models (PLMs) are often computationally expensive in inference.
We propose a dynamic token reduction approach to accelerate PLMs' inference, named TR-BERT.
TR-BERT formulates the token reduction process as a multi-step token selection problem and automatically learns the selection strategy via reinforcement learning.
arXiv Detail & Related papers (2021-05-25T02:28:51Z) - Using Prior Knowledge to Guide BERT's Attention in Semantic Textual
Matching Tasks [13.922700041632302]
We study the problem of incorporating prior knowledge into a deep Transformer-based model,i.e.,Bidirectional Representations from Transformers (BERT)
We obtain better understanding of what task-specific knowledge BERT needs the most and where it is most needed.
Experiments demonstrate that the proposed knowledge-enhanced BERT is able to consistently improve semantic textual matching performance.
arXiv Detail & Related papers (2021-02-22T12:07:16Z) - Incorporating BERT into Parallel Sequence Decoding with Adapters [82.65608966202396]
We propose to take two different BERT models as the encoder and decoder respectively, and fine-tune them by introducing simple and lightweight adapter modules.
We obtain a flexible and efficient model which is able to jointly leverage the information contained in the source-side and target-side BERT models.
Our framework is based on a parallel sequence decoding algorithm named Mask-Predict considering the bi-directional and conditional independent nature of BERT.
arXiv Detail & Related papers (2020-10-13T03:25:15Z) - TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for
Efficient Retrieval [11.923682816611716]
We present TwinBERT model for effective and efficient retrieval.
It has twin-structured BERT-like encoders to represent query and document respectively.
It allows document embeddings to be pre-computed offline and cached in memory.
arXiv Detail & Related papers (2020-02-14T22:44:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.