Related papers: FLIMs: Fault Localization Interference Mutants, Definition, Recognition and Mitigation

FLIMs: Fault Localization Interference Mutants, Definition, Recognition and Mitigation

URL: http://arxiv.org/abs/2511.23302v1
Date: Fri, 28 Nov 2025 16:00:44 GMT
Title: FLIMs: Fault Localization Interference Mutants, Definition, Recognition and Mitigation
Authors: Hengyuan Liu, Zheng Li, Donghua Wang, Yankai Wu, Xiang Chen, Yong Liu,
Abstract summary: We develop a fault localization framework that reduces misleading interference while preserving real fault-revealing information.<n> MBFL-FLIM achieves an average improvement of 44 faults in the Top-1 metric, representing a significant enhancement over baseline methods.
Score: 18.9509632937475
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mutation-based Fault Localization (MBFL) has been widely explored for automated software debugging, leveraging artificial mutants to identify faulty code entities. However, MBFL faces significant challenges due to interference mutants generated from non-faulty code entities but can be killed by failing tests. These mutants mimic the test sensitivity behaviors of real faulty code entities and weaken the effectiveness of fault localization. To address this challenge, we introduce the concept of Fault Localization Interference Mutants (FLIMs) and conduct a theoretical analysis based on the Reachability, Infection, Propagation, and Revealability (RIPR) model, identifying four distinct interference causes. Building on this, we propose a novel approach to semantically recognize and mitigate FLIMs using LLM-based semantic analysis, enhanced by fine-tuning techniques and confidence estimation strategies to address LLM output instability. The recognized FLIMs are then mitigated by refining the suspiciousness scores calculated from MBFL techniques. We integrate FLIM recognition and mitigation into the MBFL workflow, developing MBFL-FLIM, a fault localization framework that enhances MBFL's effectiveness by reducing misleading interference while preserving real fault-revealing information. Our empirical experiments on the Defects4J benchmark with 395 program versions using eight LLMs demonstrate MBFL-FLIM's superiority over traditional SBFL and MBFL methods, advanced dynamic feature-based approaches, and recent LLM-based fault localization techniques. Specifically, MBFL-FLIM achieves an average improvement of 44 faults in the Top-1 metric, representing a significant enhancement over baseline methods. Further evaluation confirms MBFL-FLIM's robust performance in multi-fault scenarios, with ablation experiments validating the contributions of the fine-tuning and confidence estimation components.

Related papers

LIME-LLM: Probing Models with Fluent Counterfactuals, Not Broken Text [7.194073942393882]
We introduce LIME-LLM, a framework that replaces random noise with hypothesis-driven, controlled perturbations.<n> Empirical results demonstrate that LIME-LLM establishes a new benchmark for black-box explainability.
arXiv Detail & Related papers (2026-01-16T19:55:06Z)
MBFL-DKMR: Improving Mutation-based Fault Localization through Denoising-based Kill Matrix Refinement [21.09532467931481]
We propose a novel approach to refine the kill matrix, a core data structure capturing mutant-test relationships in MBFL.<n>We introduce DKMR, which employs two key stages: signal enhancement through hybrid matrix construction to improve the signal-to-noise ratio for better denoising, and signal denoising via frequency domain filtering to suppress noise.<n>Our evaluation on Defects4J v2.0.0 demonstrates that MBFL-DKMR effectively mitigates the noise and outperforms the state-of-the-art MBFL techniques.
arXiv Detail & Related papers (2025-11-28T06:48:00Z)
Digging Into the Internal: Causality-Based Analysis of LLM Function Calling [20.565096639708162]
We show that Function calling (FC) can substantially enhance the compliance of large language models with user instructions.<n>We conduct experiments comparing the effectiveness of FC-based instructions against conventional prompting methods.<n>FC shows an average performance improvement of around 135% over conventional prompting methods in detecting malicious inputs.
arXiv Detail & Related papers (2025-09-18T08:30:26Z)
LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization [59.75242204923353]
We introduce LLM-Lasso, a framework that leverages large language models (LLMs) to guide feature selection in Lasso regression.<n>LLMs generate penalty factors for each feature, which are converted into weights for the Lasso penalty using a simple, tunable model.<n>Features identified as more relevant by the LLM receive lower penalties, increasing their likelihood of being retained in the final model.
arXiv Detail & Related papers (2025-02-15T02:55:22Z)
A Multi-Agent Approach to Fault Localization via Graph-Based Retrieval and Reflexion [8.22737389683156]
Traditional fault localization techniques require extensive training datasets and high computational resources.<n>Recent advances in Large Language Models (LLMs) offer new opportunities by enhancing code understanding and reasoning.<n>We propose LLM4FL, a multi-agent fault localization framework that utilizes three specialized LLM agents.<n> evaluated on the Defects4J benchmark, which includes 675 faults from 14 Java projects, LLM4FL achieves an 18.55% improvement in Top-1 accuracy over AutoFL and 4.82% over SoapFL.
arXiv Detail & Related papers (2024-09-20T16:47:34Z)
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models [50.331708897857574]
We introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications. FactorLLM achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed.
arXiv Detail & Related papers (2024-08-15T16:45:16Z)
Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses. Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives. The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z)
R-SFLLM: Jamming Resilient Framework for Split Federated Learning with Large Language Models [65.04475956174959]
Split federated learning (SFL) is a compute-efficient paradigm in distributed machine learning (ML)<n>A significant challenge in SFL, particularly when deployed over wireless channels, is the susceptibility of transmitted model parameters to adversarial jamming.<n>This paper develops a physical layer framework for resilient SFL with large language models (LLMs) and vision language models (VLMs) over wireless networks.
arXiv Detail & Related papers (2024-07-16T12:21:29Z)
UBench: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions [10.28688988951815]
We introduce UBench, a new benchmark for evaluating the uncertainty of large language models (LLMs)<n>Unlike other benchmarks, UBench is based on confidence intervals. It encompasses 11,978 multiple-choice questions spanning knowledge, language, understanding, and reasoning capabilities.<n>Our analysis reveals several crucial insights: 1) Our confidence interval-based methods are highly effective for uncertainty quantification; 2) Regarding uncertainty, outstanding open-source models show competitive performance versus closed-source models; 3) CoT and RP prompts present potential ways to improve model reliability, while the influence of temperature changes follows no universal rule.
arXiv Detail & Related papers (2024-06-18T16:50:38Z)
Learning Test-Mutant Relationship for Accurate Fault Localisation [16.080629795085322]
Automated fault localisation aims to assist developers in identifying the root cause of the fault by narrowing down the space of likely fault locations. Several Mutation Based Fault Localisation (MBFL) techniques have been proposed to automatically locate faults. Despite their success, existing MBFL techniques suffer from the cost of performing mutation analysis after the fault is observed. This paper proposes a new MBFL technique called SIMFL, which exploits ahead-of-time mutation analysis to localise current faults.
arXiv Detail & Related papers (2023-06-04T10:09:38Z)
LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond [135.8013388183257]
We propose a new protocol for inconsistency detection benchmark creation and implement it in a 10-domain benchmark called SummEdits. Most LLMs struggle on SummEdits, with performance close to random chance. The best-performing model, GPT-4, is still 8% below estimated human performance.
arXiv Detail & Related papers (2023-05-23T21:50:06Z)
Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization [69.07420650261649]
We introduce a novel, simple, and powerful contrastive MI estimator named as FLO. Empirically, our FLO estimator overcomes the limitations of its predecessors and learns more efficiently. The utility of FLO is verified using an extensive set of benchmarks, which also reveals the trade-offs in practical MI estimation.
arXiv Detail & Related papers (2021-07-02T15:20:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.