Beyond Self-learned Attention: Mitigating Attention Bias in
Transformer-based Models Using Attention Guidance
- URL: http://arxiv.org/abs/2402.16790v1
- Date: Mon, 26 Feb 2024 18:03:50 GMT
- Title: Beyond Self-learned Attention: Mitigating Attention Bias in
Transformer-based Models Using Attention Guidance
- Authors: Jiri Gesi and Iftekhar Ahmed
- Abstract summary: We introduce SyntaGuid, a novel approach to guide Transformer-based models towards critical source code tokens.
We show that SyntaGuid can improve overall performance up to 3.25% and fix up to 28.3% wrong predictions.
- Score: 9.486558126032639
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer-based models have demonstrated considerable potential for source
code modeling tasks in software engineering. However, they are limited by their
dependence solely on automatic self-attention weight learning mechanisms.
Previous studies have shown that these models overemphasize delimiters added by
tokenizers (e.g., [CLS], [SEP]), which may lead to overlooking essential
information in the original input source code. To address this challenge, we
introduce SyntaGuid, a novel approach that utilizes the observation that
attention weights tend to be biased towards specific source code syntax tokens
and abstract syntax tree (AST) elements in fine-tuned language models when they
make correct predictions. SyntaGuid facilitates the guidance of
attention-weight learning, leading to improved model performance on various
software engineering tasks. We evaluate the effectiveness of SyntaGuid on
multiple tasks and demonstrate that it outperforms existing state-of-the-art
models in overall performance without requiring additional data. Experimental
result shows that SyntaGuid can improve overall performance up to 3.25% and fix
up to 28.3% wrong predictions. Our work represents the first attempt to guide
the attention of Transformer-based models towards critical source code tokens
during fine-tuning, highlighting the potential for enhancing Transformer-based
models in software engineering.
Related papers
- Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment [57.0121616203175]
We propose FiSAO, a novel self-alignment method that utilizes the model's own visual encoder as a fine-grained verifier to improve vision-language alignment.
By leveraging token-level feedback from the vision encoder, FiSAO significantly improves vision-language alignment, even surpassing traditional preference tuning methods that require additional data.
arXiv Detail & Related papers (2024-10-18T03:34:32Z) - Making the Most of your Model: Methods for Finetuning and Applying Pretrained Transformers [0.21756081703276003]
This thesis provides methods and analysis of models which make progress on this goal.
We introduce two new finetuning methods which add new capabilities to the models they are used on.
We provide theoretical and empirical insights on the divergence of model-likelihood and output quality.
arXiv Detail & Related papers (2024-08-29T03:50:24Z) - Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption.
We analyze how magnitude-based models affect generalization while improving adaption.
We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z) - Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity
Tracking [53.66999416757543]
We study how fine-tuning affects the internal mechanisms implemented in language models.
Fine-tuning enhances, rather than alters, the mechanistic operation of the model.
arXiv Detail & Related papers (2024-02-22T18:59:24Z) - Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - Automatic Rule Induction for Efficient Semi-Supervised Learning [56.91428251227253]
Semi-supervised learning has shown promise in allowing NLP models to generalize from small amounts of labeled data.
Pretrained transformer models act as black-box correlation engines that are difficult to explain and sometimes behave unreliably.
We propose tackling both of these challenges via Automatic Rule Induction (ARI), a simple and general-purpose framework.
arXiv Detail & Related papers (2022-05-18T16:50:20Z) - Assemble Foundation Models for Automatic Code Summarization [9.53949558569201]
We propose a flexible and robust approach for automatic code summarization based on neural networks.
We assemble available foundation models, such as CodeBERT and GPT-2, into a single model named AdaMo.
We introduce two adaptive schemes from the perspective of knowledge transfer, namely continuous pretraining and intermediate finetuning.
arXiv Detail & Related papers (2022-01-13T21:38:33Z) - End-to-End Weak Supervision [15.125993628007972]
We propose an end-to-end approach for directly learning the downstream model.
We show improved performance over prior work in terms of end model performance on downstream test sets.
arXiv Detail & Related papers (2021-07-05T19:10:11Z) - S3VAE: Self-Supervised Sequential VAE for Representation Disentanglement
and Data Generation [31.38329747789168]
We propose a sequential variational autoencoder to learn disentangled representations of sequential data under self-supervision.
We exploit the benefits of some readily accessible supervisory signals from input data itself or some off-the-shelf functional models.
Our model can easily disentangle the representation of an input sequence into static factors and dynamic factors.
arXiv Detail & Related papers (2020-05-23T00:44:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.