Beyond Self-learned Attention: Mitigating Attention Bias in
Transformer-based Models Using Attention Guidance
- URL: http://arxiv.org/abs/2402.16790v1
- Date: Mon, 26 Feb 2024 18:03:50 GMT
- Title: Beyond Self-learned Attention: Mitigating Attention Bias in
Transformer-based Models Using Attention Guidance
- Authors: Jiri Gesi and Iftekhar Ahmed
- Abstract summary: We introduce SyntaGuid, a novel approach to guide Transformer-based models towards critical source code tokens.
We show that SyntaGuid can improve overall performance up to 3.25% and fix up to 28.3% wrong predictions.
- Score: 9.486558126032639
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer-based models have demonstrated considerable potential for source
code modeling tasks in software engineering. However, they are limited by their
dependence solely on automatic self-attention weight learning mechanisms.
Previous studies have shown that these models overemphasize delimiters added by
tokenizers (e.g., [CLS], [SEP]), which may lead to overlooking essential
information in the original input source code. To address this challenge, we
introduce SyntaGuid, a novel approach that utilizes the observation that
attention weights tend to be biased towards specific source code syntax tokens
and abstract syntax tree (AST) elements in fine-tuned language models when they
make correct predictions. SyntaGuid facilitates the guidance of
attention-weight learning, leading to improved model performance on various
software engineering tasks. We evaluate the effectiveness of SyntaGuid on
multiple tasks and demonstrate that it outperforms existing state-of-the-art
models in overall performance without requiring additional data. Experimental
result shows that SyntaGuid can improve overall performance up to 3.25% and fix
up to 28.3% wrong predictions. Our work represents the first attempt to guide
the attention of Transformer-based models towards critical source code tokens
during fine-tuning, highlighting the potential for enhancing Transformer-based
models in software engineering.
Related papers
- Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption.
We analyze how magnitude-based models affect generalization while improving adaption.
We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z) - Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity
Tracking [53.66999416757543]
We study how fine-tuning affects the internal mechanisms implemented in language models.
Fine-tuning enhances, rather than alters, the mechanistic operation of the model.
arXiv Detail & Related papers (2024-02-22T18:59:24Z) - Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Learning Active Subspaces and Discovering Important Features with Gaussian Radial Basis Functions Neural Networks [0.0]
We show that precious information is contained in the spectrum of the precision matrix that can be extracted once the training of the model is completed.
We conducted numerical experiments for regression, classification, and feature selection tasks.
Our results demonstrate that the proposed model does not only yield an attractive prediction performance compared to the competitors.
arXiv Detail & Related papers (2023-07-11T09:54:30Z) - Automatic Rule Induction for Efficient Semi-Supervised Learning [56.91428251227253]
Semi-supervised learning has shown promise in allowing NLP models to generalize from small amounts of labeled data.
Pretrained transformer models act as black-box correlation engines that are difficult to explain and sometimes behave unreliably.
We propose tackling both of these challenges via Automatic Rule Induction (ARI), a simple and general-purpose framework.
arXiv Detail & Related papers (2022-05-18T16:50:20Z) - Generative Modeling Helps Weak Supervision (and Vice Versa) [87.62271390571837]
We propose a model fusing weak supervision and generative adversarial networks.
It captures discrete variables in the data alongside the weak supervision derived label estimate.
It is the first approach to enable data augmentation through weakly supervised synthetic images and pseudolabels.
arXiv Detail & Related papers (2022-03-22T20:24:21Z) - Assemble Foundation Models for Automatic Code Summarization [9.53949558569201]
We propose a flexible and robust approach for automatic code summarization based on neural networks.
We assemble available foundation models, such as CodeBERT and GPT-2, into a single model named AdaMo.
We introduce two adaptive schemes from the perspective of knowledge transfer, namely continuous pretraining and intermediate finetuning.
arXiv Detail & Related papers (2022-01-13T21:38:33Z) - End-to-End Weak Supervision [15.125993628007972]
We propose an end-to-end approach for directly learning the downstream model.
We show improved performance over prior work in terms of end model performance on downstream test sets.
arXiv Detail & Related papers (2021-07-05T19:10:11Z) - S3VAE: Self-Supervised Sequential VAE for Representation Disentanglement
and Data Generation [31.38329747789168]
We propose a sequential variational autoencoder to learn disentangled representations of sequential data under self-supervision.
We exploit the benefits of some readily accessible supervisory signals from input data itself or some off-the-shelf functional models.
Our model can easily disentangle the representation of an input sequence into static factors and dynamic factors.
arXiv Detail & Related papers (2020-05-23T00:44:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.