BURT: BERT-inspired Universal Representation from Learning Meaningful
Segment
- URL: http://arxiv.org/abs/2012.14320v2
- Date: Thu, 31 Dec 2020 09:56:21 GMT
- Title: BURT: BERT-inspired Universal Representation from Learning Meaningful
Segment
- Authors: Yian Li, Hai Zhao
- Abstract summary: This work introduces and explores the universal representation learning, i.e., embeddings of different levels of linguistic unit in a uniform vector space.
We present a universal representation model, BURT, to encode different levels of linguistic unit into the same vector space.
Specifically, we extract and mask meaningful segments based on point-wise mutual information (PMI) to incorporate different granular objectives into the pre-training stage.
- Score: 46.51685959045527
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although pre-trained contextualized language models such as BERT achieve
significant performance on various downstream tasks, current language
representation still only focuses on linguistic objective at a specific
granularity, which may not applicable when multiple levels of linguistic units
are involved at the same time. Thus this work introduces and explores the
universal representation learning, i.e., embeddings of different levels of
linguistic unit in a uniform vector space. We present a universal
representation model, BURT (BERT-inspired Universal Representation from
learning meaningful segmenT), to encode different levels of linguistic unit
into the same vector space. Specifically, we extract and mask meaningful
segments based on point-wise mutual information (PMI) to incorporate different
granular objectives into the pre-training stage. We conduct experiments on
datasets for English and Chinese including the GLUE and CLUE benchmarks, where
our model surpasses its baselines and alternatives on a wide range of
downstream tasks. We present our approach of constructing analogy datasets in
terms of words, phrases and sentences and experiment with multiple
representation models to examine geometric properties of the learned vector
space through a task-independent evaluation. Finally, we verify the
effectiveness of our unified pre-training strategy in two real-world text
matching scenarios. As a result, our model significantly outperforms existing
information retrieval (IR) methods and yields universal representations that
can be directly applied to retrieval-based question-answering and natural
language generation tasks.
Related papers
- Universal Segmentation at Arbitrary Granularity with Language
Instruction [59.76130089644841]
We present UniLSeg, a universal segmentation model that can perform segmentation at any semantic level with the guidance of language instructions.
For training UniLSeg, we reorganize a group of tasks from original diverse distributions into a unified data format, where images with texts describing segmentation targets as input and corresponding masks are output.
arXiv Detail & Related papers (2023-12-04T04:47:48Z) - Investigating semantic subspaces of Transformer sentence embeddings
through linear structural probing [2.5002227227256864]
We present experiments with semantic structural probing, a method for studying sentence-level representations.
We apply our method to language models from different families (encoder-only, decoder-only, encoder-decoder) and of different sizes in the context of two tasks.
We find that model families differ substantially in their performance and layer dynamics, but that the results are largely model-size invariant.
arXiv Detail & Related papers (2023-10-18T12:32:07Z) - Pre-training Universal Language Representation [46.51685959045527]
This work introduces universal language representation learning, i.e., embeddings of different levels of linguistic units or text with quite diverse lengths in a uniform vector space.
We empirically verify that well designed pre-training scheme may effectively yield universal language representation.
arXiv Detail & Related papers (2021-05-30T09:29:01Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z) - Learning Universal Representations from Word to Sentence [89.82415322763475]
This work introduces and explores the universal representation learning, i.e., embeddings of different levels of linguistic unit in a uniform vector space.
We present our approach of constructing analogy datasets in terms of words, phrases and sentences.
We empirically verify that well pre-trained Transformer models incorporated with appropriate training settings may effectively yield universal representation.
arXiv Detail & Related papers (2020-09-10T03:53:18Z) - BURT: BERT-inspired Universal Representation from Twin Structure [89.82415322763475]
BURT (BERT inspired Universal Representation from Twin Structure) is capable of generating universal, fixed-size representations for input sequences of any granularity.
Our proposed BURT adopts the Siamese network, learning sentence-level representations from natural language inference dataset and word/phrase-level representations from paraphrasing dataset.
We evaluate BURT across different granularities of text similarity tasks, including STS tasks, SemEval2013 Task 5(a) and some commonly used word similarity tasks.
arXiv Detail & Related papers (2020-04-29T04:01:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.