Related papers: Identifying Root Causes of Null Pointer Exceptions with Logical Inferences

Identifying Root Causes of Null Pointer Exceptions with Logical Inferences

URL: http://arxiv.org/abs/2412.01005v1
Date: Sun, 01 Dec 2024 23:48:00 GMT
Title: Identifying Root Causes of Null Pointer Exceptions with Logical Inferences
Authors: Jindae Kim, Jaewoo Song,
Abstract summary: We propose LogicFL, a novel logical fault localization technique for Null Pointer Exceptions (NPEs)<n>With logic programming, LogicFL imitates human developers' deduction process of fault localization, and identifies causes of NPEs after logical inferences on collected facts about faulty code and test execution.<n>In an empirical evaluation of 76 NPE bugs from Apache Commons projects and the Defects4J benchmark, LogicFL accurately identified the fault locations and pinpointed the exact code fragments causing the NPEs for 67 bugs.
Score: 0.21485350418225244
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recently, Large Language Model (LLM)-based Fault Localization (FL) techniques have been proposed, and showed improved performance with explanations on FL results. However, a major issue with LLM-based FL techniques is their heavy reliance on LLMs, which are often unreliable, expensive, and difficult to analyze or improve. When results are unsatisfactory, it is challenging both to determine a cause and to refine a technique for better outcomes. To address this issue, we propose LogicFL, a novel logical fault localization technique for Null Pointer Exceptions (NPEs). With logic programming, LogicFL imitates human developers' deduction process of fault localization, and identifies causes of NPEs after logical inferences on collected facts about faulty code and test execution. In an empirical evaluation of 76 NPE bugs from Apache Commons projects and the Defects4J benchmark, LogicFL accurately identified the fault locations and pinpointed the exact code fragments causing the NPEs for 67 bugs (88.16%), which were 19.64% and 4.69% more bugs than two compared LLM-based FL techniques respectively. In addition, LogicFL can be executed on a low-performance machine similar to a typical laptop, with an average runtime of 21.63 seconds and a worst-case time of under two minutes, including test execution and output file generation. Moreover, when compared to the two LLM-based FL techniques using the GPT-4o model, LogicFL was significantly more cost-efficient, as those techniques required 343.94 and 3,736.19 times the cost of LogicFL, respectively. Last but not least, the deduction process in LogicFL for providing FL results is fully traceable, enabling us to understand the reasoning behind the technique's outcomes and to further enhance the technique.

Related papers

Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs [84.30534714651093]
We present an innovative APR tool for Dafny, a verification-aware programming language.<n>We localize faults through a series of steps, which include using Hoare Logic to determine the state of each statement within the program.<n>We evaluate our approach using DafnyBench, a benchmark of real-world Dafny programs.
arXiv Detail & Related papers (2025-07-04T15:36:12Z)
LLM2: Let Large Language Models Harness System 2 Reasoning [65.89293674479907]
Large language models (LLMs) have exhibited impressive capabilities across a myriad of tasks, yet they occasionally yield undesirable outputs. We introduce LLM2, a novel framework that combines an LLM with a process-based verifier. LLMs2 is responsible for generating plausible candidates, while the verifier provides timely process-based feedback to distinguish desirable and undesirable outputs.
arXiv Detail & Related papers (2024-12-29T06:32:36Z)
PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback [78.89596149768458]
Large Language Models (LLMs) are widely adopted for assisting in software development tasks. We propose PerfCodeGen, a training-free framework that enhances the performance of LLM-generated code.
arXiv Detail & Related papers (2024-11-18T06:22:38Z)
Enhancing Fault Localization Through Ordered Code Analysis with LLM Agents and Self-Reflection [8.22737389683156]
Large Language Models (LLMs) offer promising improvements in fault localization by enhancing code comprehension and reasoning. We introduce LLM4FL, a novel LLM-agent-based fault localization approach that integrates SBFL rankings with a divide-and-conquer strategy. Our results demonstrate that LLM4FL outperforms AutoFL by 19.27% in Top-1 accuracy and surpasses state-of-the-art supervised techniques such as DeepFL and Grace.
arXiv Detail & Related papers (2024-09-20T16:47:34Z)
Impact of Large Language Models of Code on Fault Localization [2.936007114555107]
We propose a simple but effective sequence generation approach for fine-tuning large language models of code for FL tasks. Specifically, we fine-tune representative encoder, encoder-decoder, and decoder-based 13 LLMCs for FL tasks. Experimental results show that LLMCs fine-tuned with our approach successfully pinpoint error positions in 50.6%, 64.2%, and 72.3% of 1,291 methods in Defects4J for Top-2/3/5 prediction.
arXiv Detail & Related papers (2024-08-19T02:36:07Z)
Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs [65.2379940117181]
We introduce code prompting, a chain of prompts that transforms a natural language problem into code. We find that code prompting exhibits a high-performance boost for multiple LLMs. Our analysis of GPT 3.5 reveals that the code formatting of the input problem is essential for performance improvement.
arXiv Detail & Related papers (2024-01-18T15:32:24Z)
LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models [63.14196038655506]
We introduce LogicAsker, a novel approach for evaluating and enhancing the logical reasoning capabilities of large language models (LLMs) Our methodology reveals significant gaps in LLMs' learning of logical rules, with identified reasoning failures ranging from 29% to 90% across different models. We leverage these findings to construct targeted demonstration examples and fine-tune data, notably enhancing logical reasoning in models like GPT-4o by up to 5%.
arXiv Detail & Related papers (2024-01-01T13:53:53Z)
LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers [60.009969929857704]
Logical reasoning is an important task for artificial intelligence with potential impacts on science, mathematics, and society. In this work, we reformulating such tasks as modular neurosymbolic programming, which we call LINC. We observe significant performance gains on FOLIO and a balanced subset of ProofWriter for three different models in nearly all experimental conditions we evaluate.
arXiv Detail & Related papers (2023-10-23T17:58:40Z)
Large Language Models for Test-Free Fault Localization [11.080712737595174]
We propose a language model based fault localization approach that locates buggy lines of code without any test coverage information. We fine-tune language models with 350 million, 6 billion, and 16 billion parameters on small, manually curated corpora of buggy programs. Our empirical evaluation shows that LLMAO improves the Top-1 results over the state-of-the-art machine learning fault localization (MLFL) baselines by 2.3%-54.4%, and Top-5 results by 14.4%-35.6%.
arXiv Detail & Related papers (2023-10-03T01:26:39Z)
Large Language Models in Fault Localisation [32.87044163543427]
This paper investigates the capability of ChatGPT-3.5 and ChatGPT-4, the two state-of-the-art LLMs, on fault localisation. Within function-level context, ChatGPT-4 outperforms all the existing fault localisation methods. However, when the code context of the Defects4J dataset expands to the class-level, ChatGPT-4's performance suffers a significant drop.
arXiv Detail & Related papers (2023-08-29T13:07:27Z)
A Quantitative and Qualitative Evaluation of LLM-Based Explainable Fault Localization [12.80414941523501]
AutoFL generates an explanation of the bug along with a suggested fault location. Experiments on 798 real-world bugs in Java and Python reveal AutoFL improves method-level acc@1 by up to 233.3% over baselines.
arXiv Detail & Related papers (2023-08-10T10:26:55Z)
Delay Minimization for Federated Learning Over Wireless Communication Networks [172.42768672943365]
The problem of delay computation for federated learning (FL) over wireless communication networks is investigated. A bisection search algorithm is proposed to obtain the optimal solution. Simulation results show that the proposed algorithm can reduce delay by up to 27.3% compared to conventional FL methods.
arXiv Detail & Related papers (2020-07-05T19:00:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.