Augmenting BERT-style Models with Predictive Coding to Improve
Discourse-level Representations
- URL: http://arxiv.org/abs/2109.04602v1
- Date: Fri, 10 Sep 2021 00:45:28 GMT
- Title: Augmenting BERT-style Models with Predictive Coding to Improve
Discourse-level Representations
- Authors: Vladimir Araujo, Andr\'es Villa, Marcelo Mendoza, Marie-Francine
Moens, Alvaro Soto
- Abstract summary: We propose to use ideas from predictive coding theory to augment BERT-style language models with a mechanism that allows them to learn discourse-level representations.
Our proposed approach is able to predict future sentences using explicit top-down connections that operate at the intermediate layers of the network.
- Score: 20.855686009404703
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current language models are usually trained using a self-supervised scheme,
where the main focus is learning representations at the word or sentence level.
However, there has been limited progress in generating useful discourse-level
representations. In this work, we propose to use ideas from predictive coding
theory to augment BERT-style language models with a mechanism that allows them
to learn suitable discourse-level representations. As a result, our proposed
approach is able to predict future sentences using explicit top-down
connections that operate at the intermediate layers of the network. By
experimenting with benchmarks designed to evaluate discourse-related knowledge
using pre-trained sentence representations, we demonstrate that our approach
improves performance in 6 out of 11 tasks by excelling in discourse
relationship detection.
Related papers
- Integrating Self-supervised Speech Model with Pseudo Word-level Targets
from Visually-grounded Speech Model [57.78191634042409]
We propose Pseudo-Word HuBERT (PW-HuBERT), a framework that integrates pseudo word-level targets into the training process.
Our experimental results on four spoken language understanding (SLU) benchmarks suggest the superiority of our model in capturing semantic information.
arXiv Detail & Related papers (2024-02-08T16:55:21Z) - Prompt-based Logical Semantics Enhancement for Implicit Discourse
Relation Recognition [4.7938839332508945]
We propose a Prompt-based Logical Semantics Enhancement (PLSE) method for Implicit Discourse Relation Recognition (IDRR)
Our method seamlessly injects knowledge relevant to discourse relation into pre-trained language models through prompt-based connective prediction.
Experimental results on PDTB 2.0 and CoNLL16 datasets demonstrate that our method achieves outstanding and consistent performance against the current state-of-the-art models.
arXiv Detail & Related papers (2023-11-01T08:38:08Z) - Scalable Learning of Latent Language Structure With Logical Offline
Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text.
As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z) - Sentence Representation Learning with Generative Objective rather than
Contrastive Objective [86.01683892956144]
We propose a novel generative self-supervised learning objective based on phrase reconstruction.
Our generative learning achieves powerful enough performance improvement and outperforms the current state-of-the-art contrastive methods.
arXiv Detail & Related papers (2022-10-16T07:47:46Z) - Few-shot Subgoal Planning with Language Models [58.11102061150875]
We show that language priors encoded in pre-trained language models allow us to infer fine-grained subgoal sequences.
In contrast to recent methods which make strong assumptions about subgoal supervision, our experiments show that language models can infer detailed subgoal sequences without any fine-tuning.
arXiv Detail & Related papers (2022-05-28T01:03:30Z) - Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular
Subword Units [19.668440671541546]
In end-to-end automatic speech recognition, a model is expected to implicitly learn representations suitable for recognizing a word-level sequence.
We propose a hierarchical conditional model that is based on connectionist temporal classification ( CTC)
Experimental results on LibriSpeech-100h, 960h and TEDLIUM2 demonstrate that the proposed model improves over a standard CTC-based model.
arXiv Detail & Related papers (2021-10-08T13:15:58Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z) - Learning Spoken Language Representations with Neural Lattice Language
Modeling [39.50831917042577]
We propose a framework that trains neural lattice language models to provide contextualized representations for spoken language understanding tasks.
The proposed two-stage pre-training approach reduces the demands of speech data and has better efficiency.
arXiv Detail & Related papers (2020-07-06T10:38:03Z) - Improved Speech Representations with Multi-Target Autoregressive
Predictive Coding [23.424410568555547]
We extend the hypothesis that hidden states that can accurately predict future frames are a useful representation for many downstream tasks.
We propose an auxiliary objective that serves as a regularization to improve generalization of the future frame prediction task.
arXiv Detail & Related papers (2020-04-11T01:09:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.