Typhoon: Towards an Effective Task-Specific Masking Strategy for
Pre-trained Language Models
- URL: http://arxiv.org/abs/2303.15619v1
- Date: Mon, 27 Mar 2023 22:27:23 GMT
- Title: Typhoon: Towards an Effective Task-Specific Masking Strategy for
Pre-trained Language Models
- Authors: Muhammed Shahir Abdurrahman, Hashem Elezabi, Bruce Changlong Xu
- Abstract summary: In this paper, we explore a task-specific masking framework for pre-trained large language models.
We develop our own masking algorithm, Typhoon, based on token input gradients, and compare this with other standard baselines.
Our implementation can be found in a public Github Repository.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Through exploiting a high level of parallelism enabled by graphics processing
units, transformer architectures have enabled tremendous strides forward in the
field of natural language processing. In a traditional masked language model,
special MASK tokens are used to prompt our model to gather contextual
information from surrounding words to restore originally hidden information. In
this paper, we explore a task-specific masking framework for pre-trained large
language models that enables superior performance on particular downstream
tasks on the datasets in the GLUE benchmark. We develop our own masking
algorithm, Typhoon, based on token input gradients, and compare this with other
standard baselines. We find that Typhoon offers performance competitive with
whole-word masking on the MRPC dataset. Our implementation can be found in a
public Github Repository.
Related papers
- MaskInversion: Localized Embeddings via Optimization of Explainability Maps [49.50785637749757]
MaskInversion generates a context-aware embedding for a query image region specified by a mask at test time.
It can be used for a broad range of tasks, including open-vocabulary class retrieval, referring expression comprehension, as well as for localized captioning and image generation.
arXiv Detail & Related papers (2024-07-29T14:21:07Z) - ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework.
We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise.
We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z) - Recovering from Privacy-Preserving Masking with Large Language Models [14.828717714653779]
We use large language models (LLMs) to suggest substitutes of masked tokens.
We show that models trained on the obfuscation corpora are able to achieve comparable performance with the ones trained on the original data.
arXiv Detail & Related papers (2023-09-12T16:39:41Z) - Investigating Masking-based Data Generation in Language Models [0.0]
A feature of BERT and models with similar architecture is the objective of masked language modeling.
Data augmentation is a data-driven technique widely used in machine learning.
Recent studies have utilized masked language model to generate artificially augmented data for NLP downstream tasks.
arXiv Detail & Related papers (2023-06-16T16:48:27Z) - Retrieval Oriented Masking Pre-training Language Model for Dense Passage
Retrieval [16.592276887533714]
Masked Language Modeling (MLM) is a major sub-task of the pre-training process.
Traditional random masking strategy tend to select a large number of tokens that have limited effect on the passage retrieval task.
We propose alternative retrieval oriented masking (dubbed as ROM) strategy where more important tokens will have a higher probability of being masked out.
arXiv Detail & Related papers (2022-10-27T02:43:48Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z) - Neural Mask Generator: Learning to Generate Adaptive Word Maskings for
Language Model Adaptation [63.195935452646815]
We propose a method to automatically generate a domain- and task-adaptive maskings of the given text for self-supervised pre-training.
We present a novel reinforcement learning-based framework which learns the masking policy.
We validate our Neural Mask Generator (NMG) on several question answering and text classification datasets.
arXiv Detail & Related papers (2020-10-06T13:27:01Z) - Masking as an Efficient Alternative to Finetuning for Pretrained
Language Models [49.64561153284428]
We learn selective binary masks for pretrained weights in lieu of modifying them through finetuning.
In intrinsic evaluations, we show that representations computed by masked language models encode information necessary for solving downstream tasks.
arXiv Detail & Related papers (2020-04-26T15:03:47Z) - UniLMv2: Pseudo-Masked Language Models for Unified Language Model
Pre-Training [152.63467944568094]
We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks.
Our experiments show that the unified language models pre-trained using PMLM achieve new state-of-the-art results on a wide range of natural language understanding and generation tasks.
arXiv Detail & Related papers (2020-02-28T15:28:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.