How Does Selective Mechanism Improve Self-Attention Networks?
- URL: http://arxiv.org/abs/2005.00979v1
- Date: Sun, 3 May 2020 04:18:44 GMT
- Title: How Does Selective Mechanism Improve Self-Attention Networks?
- Authors: Xinwei Geng, Longyue Wang, Xing Wang, Bing Qin, Ting Liu, Zhaopeng Tu
- Abstract summary: Self-attention networks (SANs) with selective mechanism has produced substantial improvements in various NLP tasks.
In this paper, we assess the strengths of selective SANs, which are implemented with a flexible and universal Gumbel-Softmax.
We empirically validate that the improvement of SSANs can be attributed in part to mitigating two commonly-cited weaknesses of SANs: word order encoding and structure modeling.
- Score: 57.75314746470783
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-attention networks (SANs) with selective mechanism has produced
substantial improvements in various NLP tasks by concentrating on a subset of
input words. However, the underlying reasons for their strong performance have
not been well explained. In this paper, we bridge the gap by assessing the
strengths of selective SANs (SSANs), which are implemented with a flexible and
universal Gumbel-Softmax. Experimental results on several representative NLP
tasks, including natural language inference, semantic role labelling, and
machine translation, show that SSANs consistently outperform the standard SANs.
Through well-designed probing experiments, we empirically validate that the
improvement of SSANs can be attributed in part to mitigating two commonly-cited
weaknesses of SANs: word order encoding and structure modeling. Specifically,
the selective mechanism improves SANs by paying more attention to content words
that contribute to the meaning of the sentence. The code and data are released
at https://github.com/xwgeng/SSAN.
Related papers
- BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network [16.986061375767488]
Generative adversarial network (GAN)-based vocoders have been intensively studied because they can synthesize high-fidelity audio waveforms faster than real-time.
Most GANs fail to obtain the optimal projection for discriminating between real and fake data in the feature space.
We propose a scheme to modify least-squares GAN, which most GAN-based vocoders adopt, so that their loss functions satisfy the requirements of SAN.
arXiv Detail & Related papers (2023-09-06T08:48:03Z) - SUN: Exploring Intrinsic Uncertainties in Text-to-SQL Parsers [61.48159785138462]
This paper aims to improve the performance of text-to-dependence by exploring the intrinsic uncertainties in the neural network based approaches (called SUN)
Extensive experiments on five benchmark datasets demonstrate that our method significantly outperforms competitors and achieves new state-of-the-art results.
arXiv Detail & Related papers (2022-09-14T06:27:51Z) - More Than Words: Towards Better Quality Interpretations of Text
Classifiers [16.66535643383862]
We show that token-based interpretability, while being a convenient first choice given the input interfaces of the ML models, is not the most effective one in all situations.
We show that higher-level feature attributions offer several advantages: 1) they are more robust as measured by the randomization tests, 2) they lead to lower variability when using approximation-based methods like SHAP, and 3) they are more intelligible to humans in situations where the linguistic coherence resides at a higher level.
arXiv Detail & Related papers (2021-12-23T10:18:50Z) - Sometimes We Want Translationese [48.45003475966808]
In some applications, faithfulness to the original (input) text is important to preserve.
We propose a simple, novel way to quantify whether an NMT system exhibits robustness and faithfulness.
arXiv Detail & Related papers (2021-04-15T17:39:47Z) - SG-Net: Syntax Guided Transformer for Language Representation [58.35672033887343]
We propose using syntax to guide the text modeling by incorporating explicit syntactic constraints into attention mechanisms for better linguistically motivated word representations.
In detail, for self-attention network (SAN) sponsored Transformer-based encoder, we introduce syntactic dependency of interest (SDOI) design into the SAN to form an SDOI-SAN with syntax-guided self-attention.
Experiments on popular benchmark tasks, including machine reading comprehension, natural language inference, and neural machine translation show the effectiveness of the proposed SG-Net design.
arXiv Detail & Related papers (2020-12-27T11:09:35Z) - Discriminatively-Tuned Generative Classifiers for Robust Natural
Language Inference [59.62779187457773]
We propose a generative classifier for natural language inference (NLI)
We compare it to five baselines, including discriminative models and large-scale pretrained language representation models like BERT.
Experiments show that GenNLI outperforms both discriminative and pretrained baselines across several challenging NLI experimental settings.
arXiv Detail & Related papers (2020-10-08T04:44:00Z) - Self-Attention Networks for Intent Detection [0.9023847175654603]
We present a novel intent detection system based on a self-attention network and a Bi-LSTM.
Our approach shows improvement by using a transformer model and deep averaging network-based universal sentence encoder.
We evaluate the system on Snips, Smart Speaker, Smart Lights, and ATIS datasets by different evaluation metrics.
arXiv Detail & Related papers (2020-06-28T12:19:15Z) - Self-Attention with Cross-Lingual Position Representation [112.05807284056337]
Position encoding (PE) is used to preserve the word order information for natural language processing tasks, generating fixed position indices for input sequences.
Due to word order divergences in different languages, modeling the cross-lingual positional relationships might help SANs tackle this problem.
We augment SANs with emphcross-lingual position representations to model the bilingually aware latent structure for the input sentence.
arXiv Detail & Related papers (2020-04-28T05:23:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.