Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach
- URL: http://arxiv.org/abs/2410.06949v2
- Date: Wed, 16 Oct 2024 05:04:45 GMT
- Title: Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach
- Authors: Xuanming Zhang, Yuxuan Chen, Yuan Yuan, Minlie Huang,
- Abstract summary: In real world software development, improper or missing exception handling can severely impact the robustness and reliability of code.
We explore the use of large language models (LLMs) to improve exception handling in code.
We propose Seeker, a multi agent framework inspired by expert developer strategies for exception handling.
- Score: 54.03528377384397
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In real world software development, improper or missing exception handling can severely impact the robustness and reliability of code. Exception handling mechanisms require developers to detect, capture, and manage exceptions according to high standards, but many developers struggle with these tasks, leading to fragile code. This problem is particularly evident in open source projects and impacts the overall quality of the software ecosystem. To address this challenge, we explore the use of large language models (LLMs) to improve exception handling in code. Through extensive analysis, we identify three key issues: Insensitive Detection of Fragile Code, Inaccurate Capture of Exception Types, and Distorted Handling Solutions. These problems are widespread across real world repositories, suggesting that robust exception handling practices are often overlooked or mishandled. In response, we propose Seeker, a multi agent framework inspired by expert developer strategies for exception handling. Seeker uses agents: Scanner, Detector, Predator, Ranker, and Handler to assist LLMs in detecting, capturing, and resolving exceptions more effectively. Our work is the first systematic study on leveraging LLMs to enhance exception handling practices, providing valuable insights for future improvements in code reliability.
Related papers
- AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios [51.46347732659174]
Large Language Models (LLMs) have demonstrated advanced capabilities in real-world agentic applications.<n>AgentIF is the first benchmark for systematically evaluating LLM instruction following ability in agentic scenarios.
arXiv Detail & Related papers (2025-05-22T17:31:10Z) - CodeCoR: An LLM-Based Self-Reflective Multi-Agent Framework for Code Generation [10.048098631259876]
Code generation aims to produce code that fulfills requirements written in natural languages automatically.<n>Large language Models (LLMs) like ChatGPT fail to ensure the syntactic and semantic correctness of the generated code.<n>We propose CodeCoR, a self-reflective multi-agent framework that evaluates the effectiveness of each agent and their collaborations.
arXiv Detail & Related papers (2025-01-14T03:21:10Z) - Seeker: Towards Exception Safety Code Generation with Intermediate Language Agents Framework [58.36391985790157]
In real world software development, improper or missing exception handling can severely impact the robustness and reliability of code.
We explore the use of large language models (LLMs) to improve exception handling in code.
We propose Seeker, a multi-agent framework inspired by expert developer strategies for exception handling.
arXiv Detail & Related papers (2024-12-16T12:35:29Z) - LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues [62.12404317786005]
EvoCoder is a continuous learning framework for issue code reproduction.
Our results show a 20% improvement in issue reproduction rates over existing SOTA methods.
arXiv Detail & Related papers (2024-11-21T08:49:23Z) - REDO: Execution-Free Runtime Error Detection for COding Agents [3.9903610503301072]
Execution-free Error Detection for COding Agents (REDO) is a method that integrates runtime errors with static analysis tools.
We demonstrate that REDO outperforms current state-of-the-art methods by achieving a 11.0% higher accuracy and a 9.1% higher weighted F1 score.
arXiv Detail & Related papers (2024-10-10T18:06:29Z) - Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework.
Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z) - RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance [0.6062751776009752]
Large Language Models (LLMs) have shown incredible potential in code generation tasks.
LLMs can generate code based on task descriptions, but accuracy remains limited.
We introduce a novel architecture of LLM-based agents for code generation and automatic debug: Refinement and Guidance debugger (RGD)
RGD decomposes the code generation task into multiple steps, ensuring a clearer workflow and enabling iterative code refinement based on self-reflection and feedback.
arXiv Detail & Related papers (2024-10-02T05:07:02Z) - DOCE: Finding the Sweet Spot for Execution-Based Code Generation [69.5305729627198]
We propose a comprehensive framework that includes candidate generation, $n$-best reranking, minimum Bayes risk (MBR) decoding, and self-ging as the core components.
Our findings highlight the importance of execution-based methods and the difference gap between execution-based and execution-free methods.
arXiv Detail & Related papers (2024-08-25T07:10:36Z) - Compromising Embodied Agents with Contextual Backdoor Attacks [69.71630408822767]
Large language models (LLMs) have transformed the development of embodied intelligence.
This paper uncovers a significant backdoor security threat within this process.
By poisoning just a few contextual demonstrations, attackers can covertly compromise the contextual environment of a black-box LLM.
arXiv Detail & Related papers (2024-08-06T01:20:12Z) - What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions.
We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types.
We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z) - Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs)
The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation.
We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z) - Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study [1.03590082373586]
We propose using large language models (LLMs) to assist in finding vulnerabilities in source code.
The aim is to test multiple state-of-the-art LLMs and identify the best prompting strategies.
We find that LLMs can pinpoint many more issues than traditional static analysis tools, outperforming traditional tools in terms of recall and F1 scores.
arXiv Detail & Related papers (2024-05-24T14:59:19Z) - Chain of Targeted Verification Questions to Improve the Reliability of Code Generated by LLMs [10.510325069289324]
We propose a self-refinement method aimed at improving the reliability of code generated by LLMs.
Our approach is based on targeted Verification Questions (VQs) to identify potential bugs within the initial code.
Our method attempts to repair these potential bugs by re-prompting the LLM with the targeted VQs and the initial code.
arXiv Detail & Related papers (2024-05-22T19:02:50Z) - A Comprehensive Study of the Capabilities of Large Language Models for Vulnerability Detection [9.422811525274675]
Large Language Models (LLMs) have demonstrated great potential for code generation and other software engineering tasks.
Vulnerability detection is of crucial importance to maintaining the security, integrity, and trustworthiness of software systems.
Recent work has applied LLMs to vulnerability detection using generic prompting techniques, but their capabilities for this task and the types of errors they make remain unclear.
arXiv Detail & Related papers (2024-03-25T21:47:36Z) - Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations [76.19419888353586]
Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations.
We present our efforts to create and deploy a library of detectors: compact and easy-to-build classification models that provide labels for various harms.
arXiv Detail & Related papers (2024-03-09T21:07:16Z) - From Misuse to Mastery: Enhancing Code Generation with Knowledge-Driven
AI Chaining [16.749379740049925]
Large Language Models (LLMs) have shown promising results in automatic code generation by improving coding efficiency to a certain extent.
However, generating high-quality and reliable code remains a formidable task because of LLMs' lack of good programming practice.
We propose a novel Knowledge-driven Prompt Chaining-based code generation approach, which decomposes code generation into an AI chain with iterative check-rewrite steps.
arXiv Detail & Related papers (2023-09-27T12:09:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.