Related papers: LPR: Large Language Models-Aided Program Reduction

LPR: Large Language Models-Aided Program Reduction

URL: http://arxiv.org/abs/2312.13064v3
Date: Sat, 11 May 2024 11:16:46 GMT
Title: LPR: Large Language Models-Aided Program Reduction
Authors: Mengxiao Zhang, Yongqiang Tian, Zhenyang Xu, Yiwen Dong, Shin Hwei Tan, Chengnian Sun,
Abstract summary: This paper proposes LPR, the first technique utilizing LLMs to perform language-specific program reduction for multiple languages. For effectiveness, LPR surpasses Vulcan by producing 24.93%, 4.47%, and 11.71% smaller programs on benchmarks in C, Rust and JavaScript.
Score: 9.772279651428406
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Program reduction is a prevalent technique to facilitate compilers' debugging by automatically minimizing bug-triggering programs. Existing program reduction techniques are either generic across languages (e.g., Perses and Vulcan) or specifically customized for one certain language by employing language-specific features, like C-Reduce. However, striking the balance between generality across multiple programming languages and specificity to individual languages in program reduction is yet to be explored. This paper proposes LPR, the first technique utilizing LLMs to perform language-specific program reduction for multiple languages. The core insight is to utilize both the language-generic syntax level program reduction (e.g., Perses) and the language-specific semantic level program transformations learned by LLMs. Alternately, language-generic program reducers efficiently reduce programs into 1-tree-minimality, which is small enough to be manageable for LLMs; LLMs effectively transform programs via the learned semantics to expose new reduction opportunities for the language-generic program reducers to further reduce the programs. Our extensive evaluation on 50 benchmarks across three languages (C, Rust, and JavaScript) has highlighted LPR's practicality and superiority over Vulcan, the state-of-the-art language-generic program reducer. For effectiveness, LPR surpasses Vulcan by producing 24.93%, 4.47%, and 11.71% smaller programs on benchmarks in C, Rust and JavaScript. Moreover, LPR and Vulcan have demonstrated their potential to complement each other. By using Vulcan on LPR's output for C programs, we achieve program sizes comparable to those reduced by C-Reduce. For efficiency, LPR takes 10.77%, 34.88%, 36.96% less time than Vulcan to finish all benchmarks in C, Rust and JavaScript, separately.

Related papers

HoarePrompt: Structural Reasoning About Program Correctness in Natural Language [6.0749049701897295]
HoarePrompt is a novel approach that adapts fundamental ideas from program analysis and verification to natural language artifacts. To manage loops, we propose few-shot-driven k-induction, an adaptation of the k-induction method widely used in model checking. Our experiments show that HoarePrompt improves the MCC by 62% compared to directly using Zero-shot-CoT prompts for correctness classification.
arXiv Detail & Related papers (2025-03-25T12:30:30Z)
CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution [50.7413285637879]
The CRUXEVAL-X code reasoning benchmark contains 19 programming languages. It comprises at least 600 subjects for each language, along with 19K content-consistent tests in total. Even a model trained solely on Python can achieve at most 34.4% Pass@1 in other languages.
arXiv Detail & Related papers (2024-08-23T11:43:00Z)
Can Large Language Models Code Like a Linguist?: A Case Study in Low Resource Sound Law Induction [6.697759280660703]
We propose a language-agnostic solution that utilizes the programming ability of Large Language Models. We generate Python sound law programs from sound change examples.
arXiv Detail & Related papers (2024-06-18T15:46:04Z)
Synthetic Programming Elicitation for Text-to-Code in Very Low-Resource Programming and Formal Languages [21.18996339478024]
We introduce emphsynthetic programming elicitation and compilation (SPEAC) SPEAC produces syntactically correct programs more frequently and without sacrificing semantic correctness. We empirically evaluate the performance of SPEAC in a case study for the UCLID5 formal verification language.
arXiv Detail & Related papers (2024-06-05T22:16:19Z)
AIOS Compiler: LLM as Interpreter for Natural Language Programming and Flow Programming of AI Agents [38.580779075892636]
We develop a novel system for Code Representation and Execution (CoRE) The proposed system unifies natural language programming, pseudo-code programming, and flow programming under the same representation for constructing language agents. During the execution, we incorporate external memory to minimize redundancy.
arXiv Detail & Related papers (2024-05-11T04:29:03Z)
ReGAL: Refactoring Programs to Discover Generalizable Abstractions [59.05769810380928]
Generalizable Abstraction Learning (ReGAL) is a method for learning a library of reusable functions via codeization. We find that the shared function libraries discovered by ReGAL make programs easier to predict across diverse domains. For CodeLlama-13B, ReGAL results in absolute accuracy increases of 11.5% on LOGO, 26.1% on date understanding, and 8.1% on TextCraft, outperforming GPT-3.5 in two of three domains.
arXiv Detail & Related papers (2024-01-29T18:45:30Z)
Refactoring Programs Using Large Language Models with Few-Shot Examples [20.48175387745551]
We demonstrate the application of using a large language model (LLM), GPT-3.5, to suggest less complex versions of the user-written Python program. We show that 95.68% of programs can beed by generating 10 candidates each, resulting in a 17.35% reduction in the average cyclomatic complexity.
arXiv Detail & Related papers (2023-11-20T11:43:45Z)
The Ups and Downs of Large Language Model Inference with Vocabulary Trimming by Language Heuristics [74.99898531299148]
This research examines vocabulary trimming (VT) inspired by restricting embedding entries to the language of interest to bolster time and memory efficiency. We apply two languages to trim the full vocabulary - Unicode-based script filtering and corpus-based selection - to different language families and sizes. It is found that VT reduces the memory usage of small models by nearly 50% and has an upper bound of 25% improvement in generation speed.
arXiv Detail & Related papers (2023-11-16T09:35:50Z)
InstructAlign: High-and-Low Resource Language Alignment via Continual Crosslingual Instruction Tuning [66.31509106146605]
Large language models (LLMs) that are tuned with instructions have demonstrated remarkable capabilities in various tasks and languages. However, their ability to generalize to underrepresented languages is limited due to the scarcity of available data. We propose InstructAlign which uses continual crosslingual instruction tuning to enable LLMs to align new unseen languages with previously learned high-resource languages.
arXiv Detail & Related papers (2023-05-23T02:51:34Z)
LEVER: Learning to Verify Language-to-Code Generation with Execution [64.36459105535]
We propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results. Specifically, we train verifiers to determine whether a program sampled from the LLMs is correct or not based on the natural language input, the program itself and its execution results. LEVER consistently improves over the base code LLMs(4.6% to 10.9% with code-davinci) and achieves new state-of-the-art results on all of them.
arXiv Detail & Related papers (2023-02-16T18:23:22Z)
Natural Language to Code Translation with Execution [82.52142893010563]
Execution result--minimum Bayes risk decoding for program selection. We show that it improves the few-shot performance of pretrained code models on natural-language-to-code tasks.
arXiv Detail & Related papers (2022-04-25T06:06:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.