Related papers: MeLT: Message-Level Transformer with Masked Document Representations as Pre-Training for Stance Detection

MeLT: Message-Level Transformer with Masked Document Representations as Pre-Training for Stance Detection

URL: http://arxiv.org/abs/2109.08113v1
Date: Thu, 16 Sep 2021 17:07:45 GMT
Title: MeLT: Message-Level Transformer with Masked Document Representations as Pre-Training for Stance Detection
Authors: Matthew Matero, Nikita Soni, Niranjan Balasubramanian, and H. Andrew Schwartz
Abstract summary: We introduce Message-Level Transformer (MeLT) -- a hierarchical message-encoder pre-trained over Twitter. We focus on stance prediction as a task benefiting from knowing the context of the message. We find that applying this pre-trained masked message-level transformer to the downstream task of stance detection achieves F1 performance of 67%.
Score: 15.194603982886484
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Much of natural language processing is focused on leveraging large capacity language models, typically trained over single messages with a task of predicting one or more tokens. However, modeling human language at higher-levels of context (i.e., sequences of messages) is under-explored. In stance detection and other social media tasks where the goal is to predict an attribute of a message, we have contextual data that is loosely semantically connected by authorship. Here, we introduce Message-Level Transformer (MeLT) -- a hierarchical message-encoder pre-trained over Twitter and applied to the task of stance prediction. We focus on stance prediction as a task benefiting from knowing the context of the message (i.e., the sequence of previous messages). The model is trained using a variant of masked-language modeling; where instead of predicting tokens, it seeks to generate an entire masked (aggregated) message vector via reconstruction loss. We find that applying this pre-trained masked message-level transformer to the downstream task of stance detection achieves F1 performance of 67%.

Related papers

TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling [46.60911294356232]
We introduce Text-Aligned Speech Tokenization and Embedding (TASTE) TASTE is a method that directly addresses the modality gap by aligning speech token with the corresponding text transcription during the tokenization stage. We conduct extensive experiments and show that TASTE can preserve essential paralinguistic information while dramatically reducing the token sequence length.
arXiv Detail & Related papers (2025-04-09T17:14:33Z)
Exploring Gradient-Guided Masked Language Model to Detect Textual Adversarial Attacks [50.53590930588431]
adversarial examples pose serious threats to natural language processing systems. Recent studies suggest that adversarial texts deviate from the underlying manifold of normal texts, whereas masked language models can approximate the manifold of normal data. We first introduce Masked Language Model-based Detection (MLMD), leveraging mask unmask operations of the masked language modeling (MLM) objective.
arXiv Detail & Related papers (2025-04-08T14:10:57Z)
InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training [23.330297074014315]
In this paper, we introduce a simple and scalable training method called InSerter, which stands for Interleaved Speech-Text Representation Pre-training. InSerter is designed to pre-train large-scale unsupervised speech-text sequences, where the speech is synthesized from randomly selected segments of an extensive text corpus using text-to-speech conversion. Our proposed InSerter achieves SOTA performance in SpeechInstructBench and demonstrates superior or competitive results across diverse speech processing tasks.
arXiv Detail & Related papers (2025-03-04T16:34:14Z)
Future Token Prediction -- Causal Language Modelling with Per-Token Semantic State Vector for Multi-Token Prediction [0.0]
This research investigates a new pretraining method called Future Token Prediction (FTP) FTP generates embedding vectors for each token position that are linearly and expansively projected to a pseudo-sequence. On a toy, but complex, coding problem, FTP networks produce significantly better results than GPT networks.
arXiv Detail & Related papers (2024-10-23T14:50:15Z)
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer [9.100416536151869]
Masked Generative Codec Transformer (MaskGCT) is a fully non-autoregressive text-to-speech model. MaskGCT eliminates the need for explicit alignment information between text and speech supervision, as well as phone-level duration prediction. Experiments with 100K hours of in-the-wild speech demonstrate that MaskGCT outperforms the current state-of-the-art zero-shot TTS systems.
arXiv Detail & Related papers (2024-09-01T15:26:30Z)
Typhoon: Towards an Effective Task-Specific Masking Strategy for Pre-trained Language Models [0.0]
In this paper, we explore a task-specific masking framework for pre-trained large language models. We develop our own masking algorithm, Typhoon, based on token input gradients, and compare this with other standard baselines. Our implementation can be found in a public Github Repository.
arXiv Detail & Related papers (2023-03-27T22:27:23Z)
Word Order Matters when you Increase Masking [70.29624135819884]
We study the effect of removing position encodings on the pre-training objective itself, to test whether models can reconstruct position information from co-occurrences alone. We find that the necessity of position information increases with the amount of masking, and that masked language models without position encodings are not able to reconstruct this information on the task.
arXiv Detail & Related papers (2022-11-08T18:14:04Z)
Retrieval Oriented Masking Pre-training Language Model for Dense Passage Retrieval [16.592276887533714]
Masked Language Modeling (MLM) is a major sub-task of the pre-training process. Traditional random masking strategy tend to select a large number of tokens that have limited effect on the passage retrieval task. We propose alternative retrieval oriented masking (dubbed as ROM) strategy where more important tokens will have a higher probability of being masked out.
arXiv Detail & Related papers (2022-10-27T02:43:48Z)
Vision-Language Pre-Training for Boosting Scene Text Detectors [57.08046351495244]
We specifically adapt vision-language joint learning for scene text detection. We propose to learn contextualized, joint representations through vision-language pre-training. The pre-trained model is able to produce more informative representations with richer semantics.
arXiv Detail & Related papers (2022-04-29T03:53:54Z)
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting [91.56988987393483]
We present a new framework for dense prediction by implicitly and explicitly leveraging the pre-trained knowledge from CLIP. Specifically, we convert the original image-text matching problem in CLIP to a pixel-text matching problem and use the pixel-text score maps to guide the learning of dense prediction models. Our method is model-agnostic, which can be applied to arbitrary dense prediction systems and various pre-trained visual backbones.
arXiv Detail & Related papers (2021-12-02T18:59:32Z)
How does a Pre-Trained Transformer Integrate Contextual Keywords? Application to Humanitarian Computing [0.0]
This paper describes how to improve a humanitarian classification task by adding the crisis event type to each tweet to be classified. It shows how the proposed neural network approach is partially over-fitting the particularities of the Crisis Benchmark.
arXiv Detail & Related papers (2021-11-07T11:24:08Z)
Sentence Bottleneck Autoencoders from Transformer Language Models [53.350633961266375]
We build a sentence-level autoencoder from a pretrained, frozen transformer language model. We adapt the masked language modeling objective as a generative, denoising one, while only training a sentence bottleneck and a single-layer modified transformer decoder. We demonstrate that the sentence representations discovered by our model achieve better quality than previous methods that extract representations from pretrained transformers on text similarity tasks, style transfer, and single-sentence classification tasks in the GLUE benchmark, while using fewer parameters than large pretrained models.
arXiv Detail & Related papers (2021-08-31T19:39:55Z)
PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation [92.7366819044397]
Self-supervised pre-training has emerged as a powerful technique for natural language understanding and generation. This work presents PALM with a novel scheme that jointly pre-trains an autoencoding and autoregressive language model on a large unlabeled corpus. An extensive set of experiments show that PALM achieves new state-of-the-art results on a variety of language generation benchmarks.
arXiv Detail & Related papers (2020-04-14T06:25:36Z)
UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training [152.63467944568094]
We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks. Our experiments show that the unified language models pre-trained using PMLM achieve new state-of-the-art results on a wide range of natural language understanding and generation tasks.
arXiv Detail & Related papers (2020-02-28T15:28:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.