MeLT: Message-Level Transformer with Masked Document Representations as
Pre-Training for Stance Detection
- URL: http://arxiv.org/abs/2109.08113v1
- Date: Thu, 16 Sep 2021 17:07:45 GMT
- Title: MeLT: Message-Level Transformer with Masked Document Representations as
Pre-Training for Stance Detection
- Authors: Matthew Matero, Nikita Soni, Niranjan Balasubramanian, and H. Andrew
Schwartz
- Abstract summary: We introduce Message-Level Transformer (MeLT) -- a hierarchical message-encoder pre-trained over Twitter.
We focus on stance prediction as a task benefiting from knowing the context of the message.
We find that applying this pre-trained masked message-level transformer to the downstream task of stance detection achieves F1 performance of 67%.
- Score: 15.194603982886484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Much of natural language processing is focused on leveraging large capacity
language models, typically trained over single messages with a task of
predicting one or more tokens. However, modeling human language at
higher-levels of context (i.e., sequences of messages) is under-explored. In
stance detection and other social media tasks where the goal is to predict an
attribute of a message, we have contextual data that is loosely semantically
connected by authorship. Here, we introduce Message-Level Transformer (MeLT) --
a hierarchical message-encoder pre-trained over Twitter and applied to the task
of stance prediction. We focus on stance prediction as a task benefiting from
knowing the context of the message (i.e., the sequence of previous messages).
The model is trained using a variant of masked-language modeling; where instead
of predicting tokens, it seeks to generate an entire masked (aggregated)
message vector via reconstruction loss. We find that applying this pre-trained
masked message-level transformer to the downstream task of stance detection
achieves F1 performance of 67%.
Related papers
- Future Token Prediction -- Causal Language Modelling with Per-Token Semantic State Vector for Multi-Token Prediction [0.0]
This research investigates a new pretraining method called Future Token Prediction (FTP)
FTP generates embedding vectors for each token position that are linearly and expansively projected to a pseudo-sequence.
On a toy, but complex, coding problem, FTP networks produce significantly better results than GPT networks.
arXiv Detail & Related papers (2024-10-23T14:50:15Z) - MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer [9.100416536151869]
Masked Generative Codec Transformer (MaskGCT) is a fully non-autoregressive text-to-speech model.
MaskGCT eliminates the need for explicit alignment information between text and speech supervision, as well as phone-level duration prediction.
Experiments with 100K hours of in-the-wild speech demonstrate that MaskGCT outperforms the current state-of-the-art zero-shot TTS systems.
arXiv Detail & Related papers (2024-09-01T15:26:30Z) - Typhoon: Towards an Effective Task-Specific Masking Strategy for
Pre-trained Language Models [0.0]
In this paper, we explore a task-specific masking framework for pre-trained large language models.
We develop our own masking algorithm, Typhoon, based on token input gradients, and compare this with other standard baselines.
Our implementation can be found in a public Github Repository.
arXiv Detail & Related papers (2023-03-27T22:27:23Z) - Word Order Matters when you Increase Masking [70.29624135819884]
We study the effect of removing position encodings on the pre-training objective itself, to test whether models can reconstruct position information from co-occurrences alone.
We find that the necessity of position information increases with the amount of masking, and that masked language models without position encodings are not able to reconstruct this information on the task.
arXiv Detail & Related papers (2022-11-08T18:14:04Z) - Retrieval Oriented Masking Pre-training Language Model for Dense Passage
Retrieval [16.592276887533714]
Masked Language Modeling (MLM) is a major sub-task of the pre-training process.
Traditional random masking strategy tend to select a large number of tokens that have limited effect on the passage retrieval task.
We propose alternative retrieval oriented masking (dubbed as ROM) strategy where more important tokens will have a higher probability of being masked out.
arXiv Detail & Related papers (2022-10-27T02:43:48Z) - Vision-Language Pre-Training for Boosting Scene Text Detectors [57.08046351495244]
We specifically adapt vision-language joint learning for scene text detection.
We propose to learn contextualized, joint representations through vision-language pre-training.
The pre-trained model is able to produce more informative representations with richer semantics.
arXiv Detail & Related papers (2022-04-29T03:53:54Z) - DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting [91.56988987393483]
We present a new framework for dense prediction by implicitly and explicitly leveraging the pre-trained knowledge from CLIP.
Specifically, we convert the original image-text matching problem in CLIP to a pixel-text matching problem and use the pixel-text score maps to guide the learning of dense prediction models.
Our method is model-agnostic, which can be applied to arbitrary dense prediction systems and various pre-trained visual backbones.
arXiv Detail & Related papers (2021-12-02T18:59:32Z) - How does a Pre-Trained Transformer Integrate Contextual Keywords?
Application to Humanitarian Computing [0.0]
This paper describes how to improve a humanitarian classification task by adding the crisis event type to each tweet to be classified.
It shows how the proposed neural network approach is partially over-fitting the particularities of the Crisis Benchmark.
arXiv Detail & Related papers (2021-11-07T11:24:08Z) - Sentence Bottleneck Autoencoders from Transformer Language Models [53.350633961266375]
We build a sentence-level autoencoder from a pretrained, frozen transformer language model.
We adapt the masked language modeling objective as a generative, denoising one, while only training a sentence bottleneck and a single-layer modified transformer decoder.
We demonstrate that the sentence representations discovered by our model achieve better quality than previous methods that extract representations from pretrained transformers on text similarity tasks, style transfer, and single-sentence classification tasks in the GLUE benchmark, while using fewer parameters than large pretrained models.
arXiv Detail & Related papers (2021-08-31T19:39:55Z) - PALM: Pre-training an Autoencoding&Autoregressive Language Model for
Context-conditioned Generation [92.7366819044397]
Self-supervised pre-training has emerged as a powerful technique for natural language understanding and generation.
This work presents PALM with a novel scheme that jointly pre-trains an autoencoding and autoregressive language model on a large unlabeled corpus.
An extensive set of experiments show that PALM achieves new state-of-the-art results on a variety of language generation benchmarks.
arXiv Detail & Related papers (2020-04-14T06:25:36Z) - UniLMv2: Pseudo-Masked Language Models for Unified Language Model
Pre-Training [152.63467944568094]
We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks.
Our experiments show that the unified language models pre-trained using PMLM achieve new state-of-the-art results on a wide range of natural language understanding and generation tasks.
arXiv Detail & Related papers (2020-02-28T15:28:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.