Defect Category Prediction Based on Multi-Source Domain Adaptation
- URL: http://arxiv.org/abs/2405.10511v1
- Date: Fri, 17 May 2024 03:30:31 GMT
- Title: Defect Category Prediction Based on Multi-Source Domain Adaptation
- Authors: Ying Xing, Mengci Zhao, Bin Yang, Yuwei Zhang, Wenjin Li, Jiawei Gu, Jun Yuan,
- Abstract summary: This paper proposes a multi-source domain adaptation framework that integrates adversarial training and attention mechanisms.
Experiments on 8 real-world open-source projects show that the proposed approach achieves significant performance improvements.
- Score: 8.712655828391016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, defect prediction techniques based on deep learning have become a prominent research topic in the field of software engineering. These techniques can identify potential defects without executing the code. However, existing approaches mostly concentrate on determining the presence of defects at the method-level code, lacking the ability to precisely classify specific defect categories. Consequently, this undermines the efficiency of developers in locating and rectifying defects. Furthermore, in practical software development, new projects often lack sufficient defect data to train high-accuracy deep learning models. Models trained on historical data from existing projects frequently struggle to achieve satisfactory generalization performance on new projects. Hence, this paper initially reformulates the traditional binary defect prediction task into a multi-label classification problem, employing defect categories described in the Common Weakness Enumeration (CWE) as fine-grained predictive labels. To enhance the model performance in cross-project scenarios, this paper proposes a multi-source domain adaptation framework that integrates adversarial training and attention mechanisms. Specifically, the proposed framework employs adversarial training to mitigate domain (i.e., software projects) discrepancies, and further utilizes domain-invariant features to capture feature correlations between each source domain and the target domain. Simultaneously, the proposed framework employs a weighted maximum mean discrepancy as an attention mechanism to minimize the representation distance between source and target domain features, facilitating model in learning more domain-independent features. The experiments on 8 real-world open-source projects show that the proposed approach achieves significant performance improvements compared to state-of-the-art baselines.
Related papers
- Open-Set Domain Adaptation with Visual-Language Foundation Models [51.49854335102149]
Unsupervised domain adaptation (UDA) has proven to be very effective in transferring knowledge from a source domain to a target domain with unlabeled data.
Open-set domain adaptation (ODA) has emerged as a potential solution to identify these classes during the training phase.
arXiv Detail & Related papers (2023-07-30T11:38:46Z) - Robustness, Evaluation and Adaptation of Machine Learning Models in the
Wild [4.304803366354879]
We study causes of impaired robustness to domain shifts and present algorithms for training domain robust models.
A key source of model brittleness is due to domain overfitting, which our new training algorithms suppress and instead encourage domain-general hypotheses.
arXiv Detail & Related papers (2023-03-05T21:41:16Z) - Deep face recognition with clustering based domain adaptation [57.29464116557734]
We propose a new clustering-based domain adaptation method designed for face recognition task in which the source and target domain do not share any classes.
Our method effectively learns the discriminative target feature by aligning the feature domain globally, and, at the meantime, distinguishing the target clusters locally.
arXiv Detail & Related papers (2022-05-27T12:29:11Z) - On Balancing Bias and Variance in Unsupervised Multi-Source-Free Domain
Adaptation [6.2200089460762085]
Methods for multi-source-free domain adaptation (MSFDA) typically train a target model using pseudo-labeled data produced by the source models.
We develop an information-theoretic bound on the generalization error of the resulting target model.
We then provide insights on how to balance this trade-off from three perspectives, including domain aggregation, selective pseudo-labeling, and joint feature alignment.
arXiv Detail & Related papers (2022-02-01T22:34:18Z) - Boosting Unsupervised Domain Adaptation with Soft Pseudo-label and
Curriculum Learning [19.903568227077763]
Unsupervised domain adaptation (UDA) improves classification performance on an unlabeled target domain by leveraging data from a fully labeled source domain.
We propose a model-agnostic two-stage learning framework, which greatly reduces flawed model predictions using soft pseudo-label strategy.
At the second stage, we propose a curriculum learning strategy to adaptively control the weighting between losses from the two domains.
arXiv Detail & Related papers (2021-12-03T14:47:32Z) - f-Domain-Adversarial Learning: Theory and Algorithms [82.97698406515667]
Unsupervised domain adaptation is used in many machine learning applications where, during training, a model has access to unlabeled data in the target domain.
We derive a novel generalization bound for domain adaptation that exploits a new measure of discrepancy between distributions based on a variational characterization of f-divergences.
arXiv Detail & Related papers (2021-06-21T18:21:09Z) - Towards Fair Knowledge Transfer for Imbalanced Domain Adaptation [61.317911756566126]
We propose a Towards Fair Knowledge Transfer framework to handle the fairness challenge in imbalanced cross-domain learning.
Specifically, a novel cross-domain mixup generation is exploited to augment the minority source set with target information to enhance fairness.
Our model significantly improves over 20% on two benchmarks in terms of the overall accuracy.
arXiv Detail & Related papers (2020-10-23T06:29:09Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z) - Progressive Graph Learning for Open-Set Domain Adaptation [48.758366879597965]
Domain shift is a fundamental problem in visual recognition which typically arises when the source and target data follow different distributions.
In this paper, we tackle a more realistic problem of open-set domain shift where the target data contains additional classes that are not present in the source data.
We introduce an end-to-end Progressive Graph Learning framework where a graph neural network with episodic training is integrated to suppress underlying conditional shift.
arXiv Detail & Related papers (2020-06-22T09:10:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.