Related papers: A Chain of AI-based Solutions for Resolving FQNs and Fixing Syntax Errors in Partial Code

A Chain of AI-based Solutions for Resolving FQNs and Fixing Syntax Errors in Partial Code

URL: http://arxiv.org/abs/2306.11981v1
Date: Wed, 21 Jun 2023 02:13:32 GMT
Title: A Chain of AI-based Solutions for Resolving FQNs and Fixing Syntax Errors in Partial Code
Authors: Qing Huang, Jiahui Zhu, Zhenchang Xing, Huan Jin, Changjing Wang, Xiwei Xu
Abstract summary: API documentation, technical blogs and programming Q&A sites contain numerous partial code that can be reused in programming tasks, but often these code are uncompilable due to unresolved names and syntax errors. We propose the Partial Code Reuse Chain (PCR-Chain) for resolving fully-qualified names (FQNs) and fixing last-mile syntax errors in partial code based on a giant large language model (LLM) like ChatGPT.
Score: 20.5627916036
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: API documentation, technical blogs and programming Q&A sites contain numerous partial code that can be reused in programming tasks, but often these code are uncompilable due to unresolved names and syntax errors. To facilitate partial code reuse, we propose the Partial Code Reuse Chain (PCR-Chain) for resolving fully-qualified names (FQNs) and fixing last-mile syntax errors in partial code based on a giant large language model (LLM) like ChatGPT. Methodologically, PCR-Chain is backed up by the underlying global-level prompt architecture (which combines three design ideas: hierarchical task breakdown, prompt composition, and a mix of prompt-based AI and non-AI units) and the local-level prompt design. Technically, we propose PCR-Chain, which employs in-context learning rather than symbolic, costly training methods. Experimental results demonstrate that in dynamically-typed languages (Python), PCR-Chain outperforms current state-of-the-art (SOTA) 5% accuracy like RING. For statically-type languages (Java), our approach achieves high accuracy of 80.5% in resolving both non-FQNs and last-mile syntax errors, surpassing SOTA methods (RING) that can only address last-mile syntax errors. The correct execution of the unit, module, and PCR-Chain demonstrates the effectiveness of the prompt design, composition, and architecture and opens up possibilities for building software engineering tools based on LLMs, replacing traditional program analysis methods.

Related papers

Neuro-Symbolic Query Compiler [57.78201019000895]
This paper presents QCompiler, a neuro-symbolic framework inspired by linguistic grammar rules and compiler design, to bridge this gap.<n>It theoretically designs a minimal yet sufficient Backus-Naur Form (BNF) grammar $G[q]$ to formalize complex queries.<n>The atomicity of the sub-queries in the leaf ensures more precise document retrieval and response generation, significantly improving the RAG system's ability to address complex queries.
arXiv Detail & Related papers (2025-05-17T09:36:03Z)
SymRTLO: Enhancing RTL Code Optimization with LLMs and Neuron-Inspired Symbolic Reasoning [18.40402135952776]
This paper presents SymRTLO, a novel neuron-symbolic RTL optimization framework. A symbolic module is proposed for analyzing and optimizing finite state machine (FSM) logic. Experiments on the RTL-Rewriter benchmark with Synopsys Design Compiler and Yosys show that SymRTLO improves power, performance, and area (PPA) by up to 43.9%, 62.5%, and 51.1%, respectively.
arXiv Detail & Related papers (2025-04-14T16:15:55Z)
Type-Constrained Code Generation with Language Models [51.03439021895432]
We introduce a type-constrained decoding approach that leverages type systems to guide code generation.<n>For this purpose, we develop novel prefix automata and a search over inhabitable types, forming a sound approach to enforce well-typedness on LLM-generated code.<n>Our approach reduces compilation errors by more than half and significantly increases functional correctness in code synthesis, translation, and repair tasks.
arXiv Detail & Related papers (2025-04-12T15:03:00Z)
Oracular Programming: A Modular Foundation for Building LLM-Enabled Software [5.294604210205507]
Large Language Models have proved surprisingly effective at solving a wide range of tasks from just a handful of examples. Their lack of reliability and modularity limits their capacity to tackle large problems that require many steps of reasoning. We propose oracular programming, a foundational paradigm for building LLM-enabled applications that lets domain experts express high-level problem-solving strategies.
arXiv Detail & Related papers (2025-02-07T20:24:43Z)
Interactive and Expressive Code-Augmented Planning with Large Language Models [62.799579304821826]
Large Language Models (LLMs) demonstrate strong abilities in common-sense reasoning and interactive decision-making. Recent techniques have sought to structure LLM outputs using control flow and other code-adjacent techniques to improve planning performance. We propose REPL-Plan, an LLM planning approach that is fully code-expressive and dynamic.
arXiv Detail & Related papers (2024-11-21T04:23:17Z)
Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [83.90988015005934]
Uncertainty quantification (UQ) is a critical component of machine learning (ML) applications. We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines. We conduct a large-scale empirical investigation of UQ and normalization techniques across nine tasks, and identify the most promising approaches.
arXiv Detail & Related papers (2024-06-21T20:06:31Z)
Constrained Decoding for Fill-in-the-Middle Code Language Models via Efficient Left and Right Quotienting of Context-Sensitive Grammars [11.279507894576213]
This paper contributes an incremental synthesis that allows early rejection of syntactically incorrect code. We extend the Earley parsing algorithm to allow for left and right quotients of context-free grammars.
arXiv Detail & Related papers (2024-02-28T02:12:47Z)
Neural Models for Source Code Synthesis and Completion [0.0]
Natural language (NL) to code suggestion systems assist developers in Integrated Development Environments (IDEs) by translating NL utterances into compilable code snippet. Current approaches mainly involve hard-coded, rule-based systems based on semantic parsing. We present sequence-to-sequence deep learning models and training paradigms to map NL to general-purpose programming languages.
arXiv Detail & Related papers (2024-02-08T17:10:12Z)
Recursive Visual Programming [53.76415744371285]
We propose Recursive Visual Programming (RVP), which simplifies generated routines, provides more efficient problem solving, and can manage more complex data structures. We show RVP's efficacy through extensive experiments on benchmarks including VSR, COVR, GQA, and NextQA.
arXiv Detail & Related papers (2023-12-04T17:27:24Z)
GEC-DePenD: Non-Autoregressive Grammatical Error Correction with Decoupled Permutation and Decoding [52.14832976759585]
Grammatical error correction (GEC) is an important NLP task that is usually solved with autoregressive sequence-to-sequence models. We propose a novel non-autoregressive approach to GEC that decouples the architecture into a permutation network. We show that the resulting network improves over previously known non-autoregressive methods for GEC.
arXiv Detail & Related papers (2023-11-14T14:24:36Z)
LILO: Learning Interpretable Libraries by Compressing and Documenting Code [71.55208585024198]
We introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code. LILO combines LLM-guided program synthesis with recent algorithmic advances in automated from Stitch. We find that AutoDoc boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions.
arXiv Detail & Related papers (2023-10-30T17:55:02Z)
Guess & Sketch: Language Model Guided Transpilation [59.02147255276078]
Learned transpilation offers an alternative to manual re-writing and engineering efforts. Probabilistic neural language models (LMs) produce plausible outputs for every input, but do so at the cost of guaranteed correctness. Guess & Sketch extracts alignment and confidence information from features of the LM then passes it to a symbolic solver to resolve semantic equivalence.
arXiv Detail & Related papers (2023-09-25T15:42:18Z)
Code-Style In-Context Learning for Knowledge-Based Question Answering [34.821095476923745]
We propose a code-style in-context learning method for Knowledge-Based Question Answering (KBQA) Experimental results on three mainstream datasets show that our method dramatically mitigated the formatting error problem in generating logic forms.
arXiv Detail & Related papers (2023-09-09T06:27:00Z)
When Do Program-of-Thoughts Work for Reasoning? [51.2699797837818]
We propose complexity-impacted reasoning score (CIRS) to measure correlation between code and reasoning abilities. Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity. Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
arXiv Detail & Related papers (2023-08-29T17:22:39Z)
AI Chain on Large Language Model for Unsupervised Control Flow Graph Generation for Statically-Typed Partial Code [21.423928174875844]
Control Flow Graphs (CFGs) are essential for visualizing, understanding and analyzing program behavior. We propose a novel approach that leverages the error-tolerant and understanding ability of pre-trained Large Language Models (LLMs) to generate CFGs.
arXiv Detail & Related papers (2023-06-01T14:52:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.