Perturbation-based Self-supervised Attention for Attention Bias in Text
Classification
- URL: http://arxiv.org/abs/2305.15684v1
- Date: Thu, 25 May 2023 03:18:18 GMT
- Title: Perturbation-based Self-supervised Attention for Attention Bias in Text
Classification
- Authors: Huawen Feng, Zhenxi Lin, Qianli Ma
- Abstract summary: We propose a perturbation-based self-supervised attention approach to guide attention learning.
We add as much noise as possible to all the words in the sentence without changing their semantics and predictions.
Experimental results on three text classification tasks show that our approach can significantly improve the performance of current attention-based models.
- Score: 31.144857032681905
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In text classification, the traditional attention mechanisms usually focus
too much on frequent words, and need extensive labeled data in order to learn.
This paper proposes a perturbation-based self-supervised attention approach to
guide attention learning without any annotation overhead. Specifically, we add
as much noise as possible to all the words in the sentence without changing
their semantics and predictions. We hypothesize that words that tolerate more
noise are less significant, and we can use this information to refine the
attention distribution. Experimental results on three text classification tasks
show that our approach can significantly improve the performance of current
attention-based models, and is more effective than existing self-supervised
methods. We also provide a visualization analysis to verify the effectiveness
of our approach.
Related papers
- Is Attention Interpretation? A Quantitative Assessment On Sets [0.0]
We study the interpretability of attention in the context of set machine learning.
We find that attention distributions are indeed often reflective of the relative importance of individual instances.
We propose to use ensembling to minimize the risk of misleading attention-based explanations.
arXiv Detail & Related papers (2022-07-26T16:25:38Z) - On Guiding Visual Attention with Language Specification [76.08326100891571]
We use high-level language specification as advice for constraining the classification evidence to task-relevant features, instead of distractors.
We show that supervising spatial attention in this way improves performance on classification tasks with biased and noisy data.
arXiv Detail & Related papers (2022-02-17T22:40:19Z) - Alignment Attention by Matching Key and Query Distributions [48.93793773929006]
This paper introduces alignment attention that explicitly encourages self-attention to match the distributions of the key and query within each head.
It is simple to convert any models with self-attention, including pre-trained ones, to the proposed alignment attention.
On a variety of language understanding tasks, we show the effectiveness of our method in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks.
arXiv Detail & Related papers (2021-10-25T00:54:57Z) - Counterfactual Attention Learning for Fine-Grained Visual Categorization
and Re-identification [101.49122450005869]
We present a counterfactual attention learning method to learn more effective attention based on causal inference.
Specifically, we analyze the effect of the learned visual attention on network prediction.
We evaluate our method on a wide range of fine-grained recognition tasks.
arXiv Detail & Related papers (2021-08-19T14:53:40Z) - Is Sparse Attention more Interpretable? [52.85910570651047]
We investigate how sparsity affects our ability to use attention as an explainability tool.
We find that only a weak relationship between inputs and co-indexed intermediate representations exists -- under sparse attention.
We observe in this setting that inducing sparsity may make it less plausible that attention can be used as a tool for understanding model behavior.
arXiv Detail & Related papers (2021-06-02T11:42:56Z) - SparseBERT: Rethinking the Importance Analysis in Self-attention [107.68072039537311]
Transformer-based models are popular for natural language processing (NLP) tasks due to its powerful capacity.
Attention map visualization of a pre-trained model is one direct method for understanding self-attention mechanism.
We propose a Differentiable Attention Mask (DAM) algorithm, which can be also applied in guidance of SparseBERT design.
arXiv Detail & Related papers (2021-02-25T14:13:44Z) - Attention Flows: Analyzing and Comparing Attention Mechanisms in
Language Models [5.866941279460248]
We propose a visual analytics approach to understanding fine-tuning in attention-based language models.
Our visualization, Attention Flows, is designed to support users in querying, tracing, and comparing attention within layers, across layers, and amongst attention heads in Transformer-based language models.
arXiv Detail & Related papers (2020-09-03T19:56:30Z) - Salience Estimation with Multi-Attention Learning for Abstractive Text
Summarization [86.45110800123216]
In the task of text summarization, salience estimation for words, phrases or sentences is a critical component.
We propose a Multi-Attention Learning framework which contains two new attention learning components for salience estimation.
arXiv Detail & Related papers (2020-04-07T02:38:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.