EALink: An Efficient and Accurate Pre-trained Framework for Issue-Commit
Link Recovery
- URL: http://arxiv.org/abs/2308.10759v1
- Date: Mon, 21 Aug 2023 14:46:43 GMT
- Title: EALink: An Efficient and Accurate Pre-trained Framework for Issue-Commit
Link Recovery
- Authors: Chenyuan Zhang, Yanlin Wang, Zhao Wei, Yong Xu, Juhong Wang, Hui Li
and Rongrong Ji
- Abstract summary: We propose an efficient and accurate pre-trained framework called EALink for issue-commit link recovery.
We construct a large-scale dataset and conduct extensive experiments to demonstrate the power of EALink.
Results show that EALink outperforms the state-of-the-art methods by a large margin (15.23%-408.65%) on various evaluation metrics.
- Score: 54.34661595290837
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Issue-commit links, as a type of software traceability links, play a vital
role in various software development and maintenance tasks. However, they are
typically deficient, as developers often forget or fail to create tags when
making commits. Existing studies have deployed deep learning techniques,
including pretrained models, to improve automatic issue-commit link
recovery.Despite their promising performance, we argue that previous approaches
have four main problems, hindering them from recovering links in large software
projects. To overcome these problems, we propose an efficient and accurate
pre-trained framework called EALink for issue-commit link recovery. EALink
requires much fewer model parameters than existing pre-trained methods,
bringing efficient training and recovery. Moreover, we design various
techniques to improve the recovery accuracy of EALink. We construct a
large-scale dataset and conduct extensive experiments to demonstrate the power
of EALink. Results show that EALink outperforms the state-of-the-art methods by
a large margin (15.23%-408.65%) on various evaluation metrics. Meanwhile, its
training and inference overhead is orders of magnitude lower than existing
methods.
Related papers
- Back to the Basics: Rethinking Issue-Commit Linking with LLM-Assisted Retrieval [12.213080309713574]
Issue-commit linking, which connects issues with commits that fix them, is crucial for software maintenance.<n>We propose EasyLink, which utilizes a vector database as a modern Information Retrieval technique.<n>Under our evaluation, EasyLink achieves an average Precision@1 of 75.91%, improving over the state-of-the-art by over four times.
arXiv Detail & Related papers (2025-07-12T08:42:10Z) - Towards Efficient and Effective Alignment of Large Language Models [7.853945494882636]
Large language models (LLMs) exhibit remarkable capabilities across diverse tasks, yet aligning them efficiently and effectively with human expectations remains a critical challenge.<n>This thesis advances LLM alignment by introducing novel methodologies in data collection, training, and evaluation.
arXiv Detail & Related papers (2025-06-11T02:08:52Z) - UniErase: Towards Balanced and Precise Unlearning in Language Models [69.04923022755547]
Large language models (LLMs) require iterative updates to address the outdated information problem.<n>UniErase is a novel unlearning framework that demonstrates precision and balanced performances between knowledge unlearning and ability retaining.
arXiv Detail & Related papers (2025-05-21T15:53:28Z) - Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval [49.669503570350166]
Generative information retrieval (GenIR) is a promising neural retrieval paradigm that formulates document retrieval as a document identifier (docid) generation task.
Existing GenIR models suffer from token-level misalignment, where models trained to predict the next token often fail to capture document-level relevance effectively.
We propose direct document relevance optimization (DDRO), which aligns token-level docid generation with document-level relevance estimation through direct optimization via pairwise ranking.
arXiv Detail & Related papers (2025-04-07T15:27:37Z) - MPLinker: Multi-template Prompt-tuning with Adversarial Training for Issue-commit Link Recovery [9.005932745392395]
Issue-commit Link Recovery (ILR) in Software Traceability (ST) plays an important role in improving the reliability, quality, and security of software systems.
Current ILR methods convert the ILR into a classification task using pre-trained language models (PLMs) and dedicated neural networks.
MPLinker redefines the ILR task as a cloze task via template-based prompt-tuning and incorporates adversarial training to enhance model generalization and reduce overfitting.
arXiv Detail & Related papers (2025-01-31T10:51:14Z) - Model Merging and Safety Alignment: One Bad Model Spoils the Bunch [70.614652904151]
Merging Large Language Models (LLMs) is a cost-effective technique for combining multiple expert LLMs into a single versatile model.
Current approaches often overlook the importance of safety alignment during merging, leading to highly misaligned models.
We evaluate several popular model merging techniques, demonstrating that existing methods do not only transfer domain expertise but also propagate misalignment.
arXiv Detail & Related papers (2024-06-20T17:59:58Z) - Efficient Degradation-aware Any Image Restoration [83.92870105933679]
We propose textitDaAIR, an efficient All-in-One image restorer employing a Degradation-aware Learner (DaLe) in the low-rank regime.
By dynamically allocating model capacity to input degradations, we realize an efficient restorer integrating holistic and specific learning.
arXiv Detail & Related papers (2024-05-24T11:53:27Z) - FREE: Faster and Better Data-Free Meta-Learning [77.90126669914324]
Data-Free Meta-Learning (DFML) aims to extract knowledge from a collection of pre-trained models without requiring the original data.
We introduce the Faster and Better Data-Free Meta-Learning framework, which contains: (i) a meta-generator for rapidly recovering training tasks from pre-trained models; and (ii) a meta-learner for generalizing to new unseen tasks.
arXiv Detail & Related papers (2024-05-02T03:43:19Z) - MASTER: Multi-task Pre-trained Bottlenecked Masked Autoencoders are
Better Dense Retrievers [140.0479479231558]
In this work, we aim to unify a variety of pre-training tasks into a multi-task pre-trained model, namely MASTER.
MASTER utilizes a shared-encoder multi-decoder architecture that can construct a representation bottleneck to compress the abundant semantic information across tasks into dense vectors.
arXiv Detail & Related papers (2022-12-15T13:57:07Z) - An Empirical Study on Data Leakage and Generalizability of Link
Prediction Models for Issues and Commits [7.061740334417124]
LinkFormer preserves and improves the accuracy of existing predictions.
Our findings support that to simulate real-world scenarios effectively, researchers must maintain the temporal flow of data.
arXiv Detail & Related papers (2022-11-01T10:54:26Z) - FastRE: Towards Fast Relation Extraction with Convolutional Encoder and
Improved Cascade Binary Tagging Framework [13.4666880421568]
We propose a fast relation extraction model (FastRE) based on convolutional encoder and improved cascade binary tagging framework.
FastRE achieves 3-10x training speed, 7-15x inference speed faster, and 1/100 parameters compared to the state-of-the-art models.
arXiv Detail & Related papers (2022-05-05T07:59:51Z) - Automated Recovery of Issue-Commit Links Leveraging Both Textual and
Non-textual Data [2.578242050187029]
Current state-of-the-art approaches for automated commit-issue linking suffer from low precision, leading to unreliable results.
We propose Hybrid-Linker to overcome such limitations by exploiting two information channels.
We evaluate Hybrid-Linker against competing approaches, namely FRLink and DeepLink on a dataset of 12 projects.
arXiv Detail & Related papers (2021-07-05T09:38:44Z) - Learning to Perturb Word Embeddings for Out-of-distribution QA [55.103586220757464]
We propose a simple yet effective DA method based on a noise generator, which learns to perturb the word embedding of the input questions and context without changing their semantics.
We validate the performance of the QA models trained with our word embedding on a single source dataset, on five different target domains.
Notably, the model trained with ours outperforms the model trained with more than 240K artificially generated QA pairs.
arXiv Detail & Related papers (2021-05-06T14:12:26Z) - FedAT: A High-Performance and Communication-Efficient Federated Learning
System with Asynchronous Tiers [22.59875034596411]
We present FedAT, a novel Federated learning method with Asynchronous Tiers under Non-i.i.d. data.
FedAT minimizes the straggler effect with improved convergence speed and test accuracy.
Results show that FedAT improves the prediction performance by up to 21.09%, and reduces the communication cost by up to 8.5x, compared to state-of-the-art FL methods.
arXiv Detail & Related papers (2020-10-12T18:38:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.