SLM: Learning a Discourse Language Representation with Sentence
Unshuffling
- URL: http://arxiv.org/abs/2010.16249v1
- Date: Fri, 30 Oct 2020 13:33:41 GMT
- Title: SLM: Learning a Discourse Language Representation with Sentence
Unshuffling
- Authors: Haejun Lee, Drew A. Hudson, Kangwook Lee and Christopher D. Manning
- Abstract summary: We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
- Score: 53.42814722621715
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce Sentence-level Language Modeling, a new pre-training objective
for learning a discourse language representation in a fully self-supervised
manner. Recent pre-training methods in NLP focus on learning either bottom or
top-level language representations: contextualized word representations derived
from language model objectives at one extreme and a whole sequence
representation learned by order classification of two given textual segments at
the other. However, these models are not directly encouraged to capture
representations of intermediate-size structures that exist in natural languages
such as sentences and the relationships among them. To that end, we propose a
new approach to encourage learning of a contextualized sentence-level
representation by shuffling the sequence of input sentences and training a
hierarchical transformer model to reconstruct the original ordering. Through
experiments on downstream tasks such as GLUE, SQuAD, and DiscoEval, we show
that this feature of our model improves the performance of the original BERT by
large margins.
Related papers
- Pointer-Guided Pre-Training: Infusing Large Language Models with Paragraph-Level Contextual Awareness [3.2925222641796554]
"pointer-guided segment ordering" (SO) is a novel pre-training technique aimed at enhancing the contextual understanding of paragraph-level text representations.
Our experiments show that pointer-guided pre-training significantly enhances the model's ability to understand complex document structures.
arXiv Detail & Related papers (2024-06-06T15:17:51Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Sentence Representation Learning with Generative Objective rather than
Contrastive Objective [86.01683892956144]
We propose a novel generative self-supervised learning objective based on phrase reconstruction.
Our generative learning achieves powerful enough performance improvement and outperforms the current state-of-the-art contrastive methods.
arXiv Detail & Related papers (2022-10-16T07:47:46Z) - Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular
Subword Units [19.668440671541546]
In end-to-end automatic speech recognition, a model is expected to implicitly learn representations suitable for recognizing a word-level sequence.
We propose a hierarchical conditional model that is based on connectionist temporal classification ( CTC)
Experimental results on LibriSpeech-100h, 960h and TEDLIUM2 demonstrate that the proposed model improves over a standard CTC-based model.
arXiv Detail & Related papers (2021-10-08T13:15:58Z) - Skill Induction and Planning with Latent Language [94.55783888325165]
We formulate a generative model of action sequences in which goals generate sequences of high-level subtask descriptions.
We describe how to train this model using primarily unannotated demonstrations by parsing demonstrations into sequences of named high-level subtasks.
In trained models, the space of natural language commands indexes a library of skills; agents can use these skills to plan by generating high-level instruction sequences tailored to novel goals.
arXiv Detail & Related papers (2021-10-04T15:36:32Z) - Cross-lingual Spoken Language Understanding with Regularized
Representation Alignment [71.53159402053392]
We propose a regularization approach to align word-level and sentence-level representations across languages without any external resource.
Experiments on the cross-lingual spoken language understanding task show that our model outperforms current state-of-the-art methods in both few-shot and zero-shot scenarios.
arXiv Detail & Related papers (2020-09-30T08:56:53Z) - BURT: BERT-inspired Universal Representation from Twin Structure [89.82415322763475]
BURT (BERT inspired Universal Representation from Twin Structure) is capable of generating universal, fixed-size representations for input sequences of any granularity.
Our proposed BURT adopts the Siamese network, learning sentence-level representations from natural language inference dataset and word/phrase-level representations from paraphrasing dataset.
We evaluate BURT across different granularities of text similarity tasks, including STS tasks, SemEval2013 Task 5(a) and some commonly used word similarity tasks.
arXiv Detail & Related papers (2020-04-29T04:01:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.