Related papers: Typhoon: Towards an Effective Task-Specific Masking Strategy for Pre-trained Language Models

Typhoon: Towards an Effective Task-Specific Masking Strategy for Pre-trained Language Models

URL: http://arxiv.org/abs/2303.15619v1
Date: Mon, 27 Mar 2023 22:27:23 GMT
Title: Typhoon: Towards an Effective Task-Specific Masking Strategy for Pre-trained Language Models
Authors: Muhammed Shahir Abdurrahman, Hashem Elezabi, Bruce Changlong Xu
Abstract summary: In this paper, we explore a task-specific masking framework for pre-trained large language models. We develop our own masking algorithm, Typhoon, based on token input gradients, and compare this with other standard baselines. Our implementation can be found in a public Github Repository.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Through exploiting a high level of parallelism enabled by graphics processing units, transformer architectures have enabled tremendous strides forward in the field of natural language processing. In a traditional masked language model, special MASK tokens are used to prompt our model to gather contextual information from surrounding words to restore originally hidden information. In this paper, we explore a task-specific masking framework for pre-trained large language models that enables superior performance on particular downstream tasks on the datasets in the GLUE benchmark. We develop our own masking algorithm, Typhoon, based on token input gradients, and compare this with other standard baselines. We find that Typhoon offers performance competitive with whole-word masking on the MRPC dataset. Our implementation can be found in a public Github Repository.

Related papers

Task-Informed Anti-Curriculum by Masking Improves Downstream Performance on Text [27.320746607958142]
Masked language modeling has become a widely adopted unsupervised technique to pre-train language models. We propose to adjust the masking ratio and to decide which tokens to mask based on a novel task-informed anti-curriculum learning scheme.
arXiv Detail & Related papers (2025-02-18T15:36:16Z)
MaskInversion: Localized Embeddings via Optimization of Explainability Maps [49.50785637749757]
MaskInversion generates a context-aware embedding for a query image region specified by a mask at test time. It can be used for a broad range of tasks, including open-vocabulary class retrieval, referring expression comprehension, as well as for localized captioning and image generation.
arXiv Detail & Related papers (2024-07-29T14:21:07Z)
ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework. We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise. We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z)
Recovering from Privacy-Preserving Masking with Large Language Models [14.828717714653779]
We use large language models (LLMs) to suggest substitutes of masked tokens. We show that models trained on the obfuscation corpora are able to achieve comparable performance with the ones trained on the original data.
arXiv Detail & Related papers (2023-09-12T16:39:41Z)
Investigating Masking-based Data Generation in Language Models [0.0]
A feature of BERT and models with similar architecture is the objective of masked language modeling. Data augmentation is a data-driven technique widely used in machine learning. Recent studies have utilized masked language model to generate artificially augmented data for NLP downstream tasks.
arXiv Detail & Related papers (2023-06-16T16:48:27Z)
Retrieval Oriented Masking Pre-training Language Model for Dense Passage Retrieval [16.592276887533714]
Masked Language Modeling (MLM) is a major sub-task of the pre-training process. Traditional random masking strategy tend to select a large number of tokens that have limited effect on the passage retrieval task. We propose alternative retrieval oriented masking (dubbed as ROM) strategy where more important tokens will have a higher probability of being masked out.
arXiv Detail & Related papers (2022-10-27T02:43:48Z)
Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT) Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder. We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z)
Neural Mask Generator: Learning to Generate Adaptive Word Maskings for Language Model Adaptation [63.195935452646815]
We propose a method to automatically generate a domain- and task-adaptive maskings of the given text for self-supervised pre-training. We present a novel reinforcement learning-based framework which learns the masking policy. We validate our Neural Mask Generator (NMG) on several question answering and text classification datasets.
arXiv Detail & Related papers (2020-10-06T13:27:01Z)
Masking as an Efficient Alternative to Finetuning for Pretrained Language Models [49.64561153284428]
We learn selective binary masks for pretrained weights in lieu of modifying them through finetuning. In intrinsic evaluations, we show that representations computed by masked language models encode information necessary for solving downstream tasks.
arXiv Detail & Related papers (2020-04-26T15:03:47Z)
UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training [152.63467944568094]
We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks. Our experiments show that the unified language models pre-trained using PMLM achieve new state-of-the-art results on a wide range of natural language understanding and generation tasks.
arXiv Detail & Related papers (2020-02-28T15:28:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.