MBL-CPDP: A Multi-objective Bilevel Method for Cross-Project Defect Prediction via Automated Machine Learning
- URL: http://arxiv.org/abs/2411.06491v1
- Date: Sun, 10 Nov 2024 15:17:15 GMT
- Title: MBL-CPDP: A Multi-objective Bilevel Method for Cross-Project Defect Prediction via Automated Machine Learning
- Authors: Jiaxin Chen, Jinliang Ding, Kay Chen Tan, Jiancheng Qian, Ke Li,
- Abstract summary: Cross-project defect prediction (CPDP) leverages machine learning (ML) techniques to proactively identify software defects, especially where project-specific data is scarce.
This paper formulates CPDP as a multi-objective bilevel optimization (MBLO) method, dubbed MBL-CPDP.
It comprises two nested problems: the upper-level, a multi-objective optimization problem, and the lower-level problem, an expensive optimization problem.
- Score: 34.89241736003651
- License:
- Abstract: Cross-project defect prediction (CPDP) leverages machine learning (ML) techniques to proactively identify software defects, especially where project-specific data is scarce. However, developing a robust ML pipeline with optimal hyperparameters that effectively use cross-project information and yield satisfactory performance remains challenging. In this paper, we resolve this bottleneck by formulating CPDP as a multi-objective bilevel optimization (MBLO) method, dubbed MBL-CPDP. It comprises two nested problems: the upper-level, a multi-objective combinatorial optimization problem, enhances robustness and efficiency in optimizing ML pipelines, while the lower-level problem is an expensive optimization problem that focuses on tuning their optimal hyperparameters. Due to the high-dimensional search space characterized by feature redundancy and inconsistent data distributions, the upper-level problem combines feature selection, transfer learning, and classification to leverage limited and heterogeneous historical data. Meanwhile, an ensemble learning method is proposed to capture differences in cross-project distribution and generalize across diverse datasets. Finally, a MBLO algorithm is presented to solve this problem while achieving high adaptability effectively. To evaluate the performance of MBL-CPDP, we compare it with five automated ML tools and $50$ CPDP techniques across $20$ projects. Extensive empirical results show that MBL-CPDPoutperforms the comparison methods, demonstrating its superior adaptability and comprehensive performance evaluation capability.
Related papers
- Bayesian Optimization for Simultaneous Selection of Machine Learning Algorithms and Hyperparameters on Shared Latent Space [16.257223975129513]
A machine learning (ML) algorithm and its hyper-parameters is crucial for the development of high-performance ML systems.
Many existing studies use Bayesian optimization (BO) for accelerating the search.
Our proposed method embeds different hyper- parameter spaces into a shared latent space, in which a surrogate multi-task model for BO is estimated.
This approach can share information of observations from different ML algorithms by which efficient optimization is expected with a smaller number of total observations.
arXiv Detail & Related papers (2025-02-13T13:43:52Z) - A Hybrid Sampling and Multi-Objective Optimization Approach for Enhanced Software Defect Prediction [3.407555189785573]
This paper introduces a novel SDP framework that integrates hybrid sampling techniques, with a suite of multi-objective optimization algorithms.
The proposed model applies feature fusion through multi-objective optimization, enhancing both the generalization capability and stability of the predictions.
Experiments conducted on datasets from NASA and PROMISE repositories demonstrate that the proposed hybrid sampling and multi-objective optimization approach improves data balance, eliminates redundant features, and enhances prediction accuracy.
arXiv Detail & Related papers (2024-10-13T23:39:04Z) - Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System [75.25394449773052]
Large Language Model (LLM) based multi-agent systems (MAS) show remarkable potential in collaborative problem-solving.
Yet they still face critical challenges: low communication efficiency, poor scalability, and a lack of effective parameter-updating optimization methods.
We present Optima, a novel framework that addresses these issues by significantly enhancing both communication efficiency and task effectiveness.
arXiv Detail & Related papers (2024-10-10T17:00:06Z) - Expensive Multi-Objective Bayesian Optimization Based on Diffusion Models [17.19004913553654]
Multi-objective Bayesian optimization (MOBO) has shown promising performance on various expensive multi-objective optimization problems (EMOPs)
We propose a novel Composite Diffusion Model based Pareto Set Learning algorithm, namely CDM-PSL, for expensive MOBO.
Our proposed algorithm attains superior performance compared with various state-of-the-art MOBO algorithms.
arXiv Detail & Related papers (2024-05-14T14:55:57Z) - A First-Order Multi-Gradient Algorithm for Multi-Objective Bi-Level Optimization [7.097069899573992]
We study the Multi-Objective Bi-Level Optimization (MOBLO) problem.
Existing gradient-based MOBLO algorithms need to compute the Hessian matrix.
We propose an efficient first-order multi-gradient method for MOBLO, called FORUM.
arXiv Detail & Related papers (2024-01-17T15:03:37Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Multi-objective hyperparameter optimization with performance uncertainty [62.997667081978825]
This paper presents results on multi-objective hyperparameter optimization with uncertainty on the evaluation of Machine Learning algorithms.
We combine the sampling strategy of Tree-structured Parzen Estimators (TPE) with the metamodel obtained after training a Gaussian Process Regression (GPR) with heterogeneous noise.
Experimental results on three analytical test functions and three ML problems show the improvement over multi-objective TPE and GPR.
arXiv Detail & Related papers (2022-09-09T14:58:43Z) - A survey on multi-objective hyperparameter optimization algorithms for
Machine Learning [62.997667081978825]
This article presents a systematic survey of the literature published between 2014 and 2020 on multi-objective HPO algorithms.
We distinguish between metaheuristic-based algorithms, metamodel-based algorithms, and approaches using a mixture of both.
We also discuss the quality metrics used to compare multi-objective HPO procedures and present future research directions.
arXiv Detail & Related papers (2021-11-23T10:22:30Z) - High Dimensional Level Set Estimation with Bayesian Neural Network [58.684954492439424]
This paper proposes novel methods to solve the high dimensional Level Set Estimation problems using Bayesian Neural Networks.
For each problem, we derive the corresponding theoretic information based acquisition function to sample the data points.
Numerical experiments on both synthetic and real-world datasets show that our proposed method can achieve better results compared to existing state-of-the-art approaches.
arXiv Detail & Related papers (2020-12-17T23:21:53Z) - Optimization-driven Machine Learning for Intelligent Reflecting Surfaces
Assisted Wireless Networks [82.33619654835348]
Intelligent surface (IRS) has been employed to reshape the wireless channels by controlling individual scattering elements' phase shifts.
Due to the large size of scattering elements, the passive beamforming is typically challenged by the high computational complexity.
In this article, we focus on machine learning (ML) approaches for performance in IRS-assisted wireless networks.
arXiv Detail & Related papers (2020-08-29T08:39:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.