Picking the Underused Heads: A Network Pruning Perspective of Attention
Head Selection for Fusing Dialogue Coreference Information
- URL: http://arxiv.org/abs/2312.09541v1
- Date: Fri, 15 Dec 2023 05:27:24 GMT
- Title: Picking the Underused Heads: A Network Pruning Perspective of Attention
Head Selection for Fusing Dialogue Coreference Information
- Authors: Zhengyuan Liu, Nancy F. Chen
- Abstract summary: Transformer-based models with the multi-head self-attention mechanism are widely used in natural language processing.
We investigate the attention head selection and manipulation strategy for feature injection from a network pruning perspective.
- Score: 50.41829484199252
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Transformer-based models with the multi-head self-attention mechanism are
widely used in natural language processing, and provide state-of-the-art
results. While the pre-trained language backbones are shown to implicitly
capture certain linguistic knowledge, explicitly incorporating structure-aware
features can bring about further improvement on the downstream tasks. However,
such enhancement often requires additional neural components and increases
training parameter size. In this work, we investigate the attention head
selection and manipulation strategy for feature injection from a network
pruning perspective, and conduct a case study on dialogue summarization. We
first rank attention heads in a Transformer-based summarizer with layer-wise
importance. We then select the underused heads through extensive analysis, and
inject structure-aware features by manipulating the selected heads.
Experimental results show that the importance-based head selection is effective
for feature injection, and dialogue summarization can be improved by
incorporating coreference information via head manipulation.
Related papers
- Towards Zero-shot Human-Object Interaction Detection via Vision-Language
Integration [14.678931157058363]
We propose a novel framework, termed Knowledge Integration to HOI (KI2HOI), that effectively integrates the knowledge of visual-language model to improve zero-shot HOI detection.
We develop an effective additive self-attention mechanism to generate more comprehensive visual representations.
Our model outperforms the previous methods in various zero-shot and full-supervised settings.
arXiv Detail & Related papers (2024-03-12T02:07:23Z) - Analysis of the Evolution of Advanced Transformer-Based Language Models:
Experiments on Opinion Mining [0.5735035463793008]
This paper studies the behaviour of the cutting-edge Transformer-based language models on opinion mining.
Our comparative study shows leads and paves the way for production engineers regarding the approach to focus on.
arXiv Detail & Related papers (2023-08-07T01:10:50Z) - Domain-specific Language Pre-training for Dialogue Comprehension on
Clinical Inquiry-Answering Conversations [28.567701055153385]
Recent developments in natural language processing suggest that large-scale pre-trained language backbones could be leveraged for machine comprehension and information extraction tasks.
Yet, due to the gap between pre-training and downstream clinical domains, it remains challenging to exploit the generic backbones for domain-specific applications.
We propose a domain-specific language pre-training, to improve performance on downstream tasks like dialogue comprehension.
arXiv Detail & Related papers (2022-06-06T08:45:03Z) - Initial Study into Application of Feature Density and
Linguistically-backed Embedding to Improve Machine Learning-based
Cyberbullying Detection [54.83707803301847]
The research was conducted on a Formspring dataset provided in a Kaggle competition on automatic cyberbullying detection.
The study confirmed the effectiveness of Neural Networks in cyberbullying detection and the correlation between classifier performance and Feature Density.
arXiv Detail & Related papers (2022-06-04T03:17:15Z) - Deep Learning For Prominence Detection In Children's Read Speech [13.041607703862724]
We present a system that operates on segmented speech waveforms to learn features relevant to prominent word detection for children's oral fluency assessment.
The chosen CRNN (convolutional recurrent neural network) framework, incorporating both word-level features and sequence information, is found to benefit from the perceptually motivated SincNet filters.
arXiv Detail & Related papers (2021-10-27T08:51:42Z) - Preliminary study on using vector quantization latent spaces for TTS/VC
systems with consistent performance [55.10864476206503]
We investigate the use of quantized vectors to model the latent linguistic embedding.
By enforcing different policies over the latent spaces in the training, we are able to obtain a latent linguistic embedding.
Our experiments show that the voice cloning system built with vector quantization has only a small degradation in terms of perceptive evaluations.
arXiv Detail & Related papers (2021-06-25T07:51:35Z) - Multi-Head Attention: Collaborate Instead of Concatenate [85.71058762269374]
We propose a collaborative multi-head attention layer that enables heads to learn shared projections.
Experiments confirm that sharing key/query dimensions can be exploited in language understanding, machine translation and vision.
arXiv Detail & Related papers (2020-06-29T20:28:52Z) - Salience Estimation with Multi-Attention Learning for Abstractive Text
Summarization [86.45110800123216]
In the task of text summarization, salience estimation for words, phrases or sentences is a critical component.
We propose a Multi-Attention Learning framework which contains two new attention learning components for salience estimation.
arXiv Detail & Related papers (2020-04-07T02:38:56Z) - Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention [70.82604384963679]
This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features.
We extract a speaker representation used for adaptation directly from the test utterance.
arXiv Detail & Related papers (2020-02-14T05:05:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.