BURT: BERT-inspired Universal Representation from Twin Structure
- URL: http://arxiv.org/abs/2004.13947v2
- Date: Mon, 3 Aug 2020 13:04:22 GMT
- Title: BURT: BERT-inspired Universal Representation from Twin Structure
- Authors: Yian Li and Hai Zhao
- Abstract summary: BURT (BERT inspired Universal Representation from Twin Structure) is capable of generating universal, fixed-size representations for input sequences of any granularity.
Our proposed BURT adopts the Siamese network, learning sentence-level representations from natural language inference dataset and word/phrase-level representations from paraphrasing dataset.
We evaluate BURT across different granularities of text similarity tasks, including STS tasks, SemEval2013 Task 5(a) and some commonly used word similarity tasks.
- Score: 89.82415322763475
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained contextualized language models such as BERT have shown great
effectiveness in a wide range of downstream Natural Language Processing (NLP)
tasks. However, the effective representations offered by the models target at
each token inside a sequence rather than each sequence and the fine-tuning step
involves the input of both sequences at one time, leading to unsatisfying
representations of various sequences with different granularities. Especially,
as sentence-level representations taken as the full training context in these
models, there comes inferior performance on lower-level linguistic units
(phrases and words). In this work, we present BURT (BERT inspired Universal
Representation from Twin Structure) that is capable of generating universal,
fixed-size representations for input sequences of any granularity, i.e., words,
phrases, and sentences, using a large scale of natural language inference and
paraphrase data with multiple training objectives. Our proposed BURT adopts the
Siamese network, learning sentence-level representations from natural language
inference dataset and word/phrase-level representations from paraphrasing
dataset, respectively. We evaluate BURT across different granularities of text
similarity tasks, including STS tasks, SemEval2013 Task 5(a) and some commonly
used word similarity tasks, where BURT substantially outperforms other
representation models on sentence-level datasets and achieves significant
improvements in word/phrase-level representation.
Related papers
- Unified BERT for Few-shot Natural Language Understanding [7.352338840651369]
We propose UBERT, a unified bidirectional language understanding model based on BERT framework.
UBERT encodes prior knowledge from various aspects, uniformly constructing learning representations across multiple NLU tasks.
Experiments show that UBERT achieves the state-of-the-art performance on 7 NLU tasks, 14 datasets on few-shot and zero-shot setting.
arXiv Detail & Related papers (2022-06-24T06:10:53Z) - Pre-training Universal Language Representation [46.51685959045527]
This work introduces universal language representation learning, i.e., embeddings of different levels of linguistic units or text with quite diverse lengths in a uniform vector space.
We empirically verify that well designed pre-training scheme may effectively yield universal language representation.
arXiv Detail & Related papers (2021-05-30T09:29:01Z) - Accurate Word Representations with Universal Visual Guidance [55.71425503859685]
This paper proposes a visual representation method to explicitly enhance conventional word embedding with multiple-aspect senses from visual guidance.
We build a small-scale word-image dictionary from a multimodal seed dataset where each word corresponds to diverse related images.
Experiments on 12 natural language understanding and machine translation tasks further verify the effectiveness and the generalization capability of the proposed approach.
arXiv Detail & Related papers (2020-12-30T09:11:50Z) - BURT: BERT-inspired Universal Representation from Learning Meaningful
Segment [46.51685959045527]
This work introduces and explores the universal representation learning, i.e., embeddings of different levels of linguistic unit in a uniform vector space.
We present a universal representation model, BURT, to encode different levels of linguistic unit into the same vector space.
Specifically, we extract and mask meaningful segments based on point-wise mutual information (PMI) to incorporate different granular objectives into the pre-training stage.
arXiv Detail & Related papers (2020-12-28T16:02:28Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z) - Learning Universal Representations from Word to Sentence [89.82415322763475]
This work introduces and explores the universal representation learning, i.e., embeddings of different levels of linguistic unit in a uniform vector space.
We present our approach of constructing analogy datasets in terms of words, phrases and sentences.
We empirically verify that well pre-trained Transformer models incorporated with appropriate training settings may effectively yield universal representation.
arXiv Detail & Related papers (2020-09-10T03:53:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.