MASKER: Masked Keyword Regularization for Reliable Text Classification
- URL: http://arxiv.org/abs/2012.09392v1
- Date: Thu, 17 Dec 2020 04:54:16 GMT
- Title: MASKER: Masked Keyword Regularization for Reliable Text Classification
- Authors: Seung Jun Moon, Sangwoo Mo, Kimin Lee, Jaeho Lee, Jinwoo Shin
- Abstract summary: We propose a fine-tuning method, coined masked keyword regularization (MASKER), that facilitates context-based prediction.
MASKER regularizes the model to reconstruct the keywords from the rest of the words and make low-confidence predictions without enough context.
We demonstrate that MASKER improves OOD detection and cross-domain generalization without degrading classification accuracy.
- Score: 73.90326322794803
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pre-trained language models have achieved state-of-the-art accuracies on
various text classification tasks, e.g., sentiment analysis, natural language
inference, and semantic textual similarity. However, the reliability of the
fine-tuned text classifiers is an often underlooked performance criterion. For
instance, one may desire a model that can detect out-of-distribution (OOD)
samples (drawn far from training distribution) or be robust against domain
shifts. We claim that one central obstacle to the reliability is the
over-reliance of the model on a limited number of keywords, instead of looking
at the whole context. In particular, we find that (a) OOD samples often contain
in-distribution keywords, while (b) cross-domain samples may not always contain
keywords; over-relying on the keywords can be problematic for both cases. In
light of this observation, we propose a simple yet effective fine-tuning
method, coined masked keyword regularization (MASKER), that facilitates
context-based prediction. MASKER regularizes the model to reconstruct the
keywords from the rest of the words and make low-confidence predictions without
enough context. When applied to various pre-trained language models (e.g.,
BERT, RoBERTa, and ALBERT), we demonstrate that MASKER improves OOD detection
and cross-domain generalization without degrading classification accuracy. Code
is available at https://github.com/alinlab/MASKER.
Related papers
- Prototype-based Aleatoric Uncertainty Quantification for Cross-modal
Retrieval [139.21955930418815]
Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space.
However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts.
We propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.
arXiv Detail & Related papers (2023-09-29T09:41:19Z) - Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks [39.51297217854375]
We propose Text-CRS, a certified robustness framework for natural language processing (NLP) based on randomized smoothing.
We show that Text-CRS can address all four different word-level adversarial operations and achieve a significant accuracy improvement.
We also provide the first benchmark on certified accuracy and radius of four word-level operations, besides outperforming the state-of-the-art certification against synonym substitution attacks.
arXiv Detail & Related papers (2023-07-31T13:08:16Z) - Towards preserving word order importance through Forced Invalidation [80.33036864442182]
We show that pre-trained language models are insensitive to word order.
We propose Forced Invalidation to help preserve the importance of word order.
Our experiments demonstrate that Forced Invalidation significantly improves the sensitivity of the models to word order.
arXiv Detail & Related papers (2023-04-11T13:42:10Z) - Unsupervised Domain Adaptation for Sparse Retrieval by Filling
Vocabulary and Word Frequency Gaps [12.573927420408365]
IR models using a pretrained language model significantly outperform lexical approaches like BM25.
This paper proposes an unsupervised domain adaptation method by filling vocabulary and word-frequency gaps.
We show that our method outperforms the present stateof-the-art domain adaptation method.
arXiv Detail & Related papers (2022-11-08T03:58:26Z) - Estimating Confidence of Predictions of Individual Classifiers and Their
Ensembles for the Genre Classification Task [0.0]
Genre identification is a subclass of non-topical text classification.
Nerve models based on pre-trained transformers, such as BERT or XLM-RoBERTa, demonstrate SOTA results in many NLP tasks.
arXiv Detail & Related papers (2022-06-15T09:59:05Z) - Learning-based Hybrid Local Search for the Hard-label Textual Attack [53.92227690452377]
We consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker could only access the prediction label.
Based on this observation, we propose a novel hard-label attack, called Learning-based Hybrid Local Search (LHLS) algorithm.
Our LHLS significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.
arXiv Detail & Related papers (2022-01-20T14:16:07Z) - Experiments with adversarial attacks on text genres [0.0]
Neural models based on pre-trained transformers, such as BERT or XLM-RoBERTa, demonstrate SOTA results in many NLP tasks.
We show that embedding-based algorithms which can replace some of the most significant'' words with words similar to them, have the ability to influence model predictions in a significant proportion of cases.
arXiv Detail & Related papers (2021-07-05T19:37:59Z) - UCPhrase: Unsupervised Context-aware Quality Phrase Tagging [63.86606855524567]
UCPhrase is a novel unsupervised context-aware quality phrase tagger.
We induce high-quality phrase spans as silver labels from consistently co-occurring word sequences.
We show that our design is superior to state-of-the-art pre-trained, unsupervised, and distantly supervised methods.
arXiv Detail & Related papers (2021-05-28T19:44:24Z) - ShufText: A Simple Black Box Approach to Evaluate the Fragility of Text
Classification Models [0.0]
Deep learning approaches based on CNN, LSTM, and Transformers have been the de facto approach for text classification.
We show that these systems are over-reliant on the important words present in the text that are useful for classification.
arXiv Detail & Related papers (2021-01-30T15:18:35Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.