Related papers: An Empirical Evaluation of Pre-trained Large Language Models for Repairing Declarative Formal Specifications

An Empirical Evaluation of Pre-trained Large Language Models for Repairing Declarative Formal Specifications

URL: http://arxiv.org/abs/2404.11050v1
Date: Wed, 17 Apr 2024 03:46:38 GMT
Title: An Empirical Evaluation of Pre-trained Large Language Models for Repairing Declarative Formal Specifications
Authors: Mohannad Alhanahnah, Md Rashedul Hasan, Hamid Bagheri,
Abstract summary: This paper presents a systematic investigation into the capacity of Large Language Models (LLMs) for repairing declarative specifications in Alloy. We propose a novel repair pipeline that integrates a dual-agent LLM framework, comprising a Repair Agent and a Prompt Agent. Our study reveals that LLMs, particularly GPT-4 variants, outperform existing techniques in terms of repair efficacy, albeit with a marginal increase in runtime and token usage.
Score: 5.395614997568524
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automatic Program Repair (APR) has garnered significant attention as a practical research domain focused on automatically fixing bugs in programs. While existing APR techniques primarily target imperative programming languages like C and Java, there is a growing need for effective solutions applicable to declarative software specification languages. This paper presents a systematic investigation into the capacity of Large Language Models (LLMs) for repairing declarative specifications in Alloy, a declarative formal language used for software specification. We propose a novel repair pipeline that integrates a dual-agent LLM framework, comprising a Repair Agent and a Prompt Agent. Through extensive empirical evaluation, we compare the effectiveness of LLM-based repair with state-of-the-art Alloy APR techniques on a comprehensive set of benchmarks. Our study reveals that LLMs, particularly GPT-4 variants, outperform existing techniques in terms of repair efficacy, albeit with a marginal increase in runtime and token usage. This research contributes to advancing the field of automatic repair for declarative specifications and highlights the promising potential of LLMs in this domain.

Related papers

Unlocking LLM Repair Capabilities in Low-Resource Programming Languages Through Cross-Language Translation and Multi-Agent Refinement [4.5051492144389504]
We introduce a novel cross-language program repair approach LANTERN. Our approach strategically translates defective code from languages where LLMs exhibit weaker repair capabilities to languages where they demonstrate stronger performance. We evaluate our method on xCodeEval, a comprehensive multilingual benchmark comprising 5,068 bugs across 11 programming languages.
arXiv Detail & Related papers (2025-03-28T15:15:56Z)
Automated Refactoring of Non-Idiomatic Python Code: A Differentiated Replication with LLMs [54.309127753635366]
We present the results of a replication study in which we investigate GPT-4 effectiveness in recommending and suggesting idiomatic actions. Our findings underscore the potential of LLMs to achieve tasks where, in the past, implementing recommenders based on complex code analyses was required.
arXiv Detail & Related papers (2025-01-28T15:41:54Z)
Bridging the Language Gaps in Large Language Models with Inference-Time Cross-Lingual Intervention [71.12193680015622]
Large Language Models (LLMs) have shown remarkable capabilities in natural language processing. LLMs exhibit significant performance gaps among different languages. We propose Inference-Time Cross-Lingual Intervention (INCLINE) to overcome these limitations without incurring significant costs.
arXiv Detail & Related papers (2024-10-16T11:23:03Z)
FastFixer: An Efficient and Effective Approach for Repairing Programming Assignments [21.848112758958543]
We propose FastFixer, an efficient and effective approach for programming assignment repair. We first propose a novel repair-oriented fine-tuning strategy, aiming to enhance the LLM's attention towards learning how to generate the necessary patch and its associated context. Considering the repair efficiency, FastFixer achieves a remarkable inference speedup of 16.67 times compared to the autoregressive decoding algorithm.
arXiv Detail & Related papers (2024-10-11T10:17:02Z)
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning. LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors. We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z)
Revisiting Evolutionary Program Repair via Code Language Model [11.711739409758476]
This paper introduces ARJA-CLM, which integrates the multiobjective evolutionary algorithm with CLM to fix multilocation bugs in Java projects. We also propose a context-aware prompt construction stratege, which enriches the prompt with additional information about accessible fields and methods for the CLM generating candidate statements.
arXiv Detail & Related papers (2024-08-20T01:57:45Z)
Prompting Encoder Models for Zero-Shot Classification: A Cross-Domain Study in Italian [75.94354349994576]
This paper explores the feasibility of employing smaller, domain-specific encoder LMs alongside prompting techniques to enhance performance in specialized contexts. Our study concentrates on the Italian bureaucratic and legal language, experimenting with both general-purpose and further pre-trained encoder-only models. The results indicate that while further pre-trained models may show diminished robustness in general knowledge, they exhibit superior adaptability for domain-specific tasks, even in a zero-shot setting.
arXiv Detail & Related papers (2024-07-30T08:50:16Z)
A Deep Dive into Large Language Models for Automated Bug Localization and Repair [12.756202755547024]
Large language models (LLMs) have shown impressive effectiveness in various software engineering tasks, including automated program repair (APR) In this study, we take a deep dive into automated bug fixing utilizing LLMs. This methodological separation of bug localization and fixing using different LLMs enables effective integration of diverse contextual information. Toggle achieves the new state-of-the-art (SOTA) performance on the CodeXGLUE code refinement benchmark.
arXiv Detail & Related papers (2024-04-17T17:48:18Z)
A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models [50.86686630756207]
Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back. Current generative models for Automatic Program Repair (APR) are pre-trained on source code and fine-tuned for repair. This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back.
arXiv Detail & Related papers (2024-01-15T22:36:31Z)
The Right Prompts for the Job: Repair Code-Review Defects with Large Language Model [15.885824575879763]
Automatic program repair (APR) techniques have the potential to reduce manual efforts in uncovering and repairing program defects during the code review (CR) process. However, the limited accuracy and considerable time costs associated with existing APR approaches hinder their adoption in industrial practice. Recent advancements in Large Language Models (LLMs) have enhanced their ability to comprehend natural and programming languages, enabling them to generate patches based on review comments.
arXiv Detail & Related papers (2023-12-29T06:12:15Z)
Automated Repair of Declarative Software Specifications in the Era of Large Language Models [12.995301758524935]
We assess the effectiveness of utilizing OpenAI's ChatGPT to repair software specifications written in the Alloy declarative language. Our study revealed that while ChatGPT falls short in comparison to existing techniques, it was able to successfully repair bugs that no other technique could address.
arXiv Detail & Related papers (2023-10-19T02:30:42Z)
Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies [104.32199881187607]
Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks. A promising approach to rectify these flaws is self-correction, where the LLM itself is prompted or guided to fix problems in its own output. This paper presents a comprehensive review of this emerging class of techniques.
arXiv Detail & Related papers (2023-08-06T18:38:52Z)
Self-Checker: Plug-and-Play Modules for Fact-Checking with Large Language Models [75.75038268227554]
Self-Checker is a framework comprising a set of plug-and-play modules that facilitate fact-checking. This framework provides a fast and efficient way to construct fact-checking systems in low-resource environments.
arXiv Detail & Related papers (2023-05-24T01:46:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.