Related papers: AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code Generation

AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code Generation

URL: http://arxiv.org/abs/2404.16333v1
Date: Thu, 25 Apr 2024 04:46:02 GMT
Title: AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code Generation
Authors: Zhensu Sun, Xiaoning Du, Zhou Yang, Li Li, David Lo,
Abstract summary: We implement the first AI-oriented grammar for Python, named Simple Python (SimPy) Compared with the original Python, SimPy not only reduces token usage by 13.5% and 10.4% for CodeLlama and GPT-4, but can also achieve equivalent, even improved, performance over the models trained on Python code.
Score: 14.831115535710692
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Besides humans and machines, Artificial Intelligence (AI) models have emerged to be another important audience of programming languages, as we come to the era of large language models (LLMs). LLMs can now excel at coding competitions and even program like developers to address various tasks, such as math calculation. Yet, the grammar and layout of existing programs are designed for humans. Particularly, abundant grammar tokens and formatting tokens are included to make the code more readable to humans. While beneficial, such a human-centric design imposes an unnecessary computational burden on LLMs where each token, either consumed or generated, consumes computational resources. To improve inference efficiency and reduce computational costs, we propose the concept of AI-oriented grammar, which aims to represent the code in a way that better suits the working mechanism of AI models. Code written with AI-oriented grammar discards formats and uses a minimum number of tokens to convey code semantics effectively. To demonstrate the feasibility of this concept, we explore and implement the first AI-oriented grammar for Python, named Simple Python (SimPy). SimPy is crafted by revising the original Python grammar through a series of heuristic rules. Programs written in SimPy maintain identical Abstract Syntax Tree (AST) structures to those in standard Python, allowing execution via a modified AST parser. In addition, we explore methods to enable existing LLMs to proficiently understand and use SimPy, and ensure the changes remain imperceptible for human developers. Compared with the original Python, SimPy not only reduces token usage by 13.5% and 10.4% for CodeLlama and GPT-4, but can also achieve equivalent, even improved, performance over the models trained on Python code.

Related papers

Program Semantic Inequivalence Game with Large Language Models [10.358176296850639]
Large Language Models (LLMs) can achieve strong performance on everyday coding tasks, but they can fail on complex tasks that require non-trivial reasoning about program semantics.<n>In this work, we explore a method to synthetically generate code reasoning training data based on a semantic inequivalence game SInQ.<n>We prove that this setup enables theoretically unlimited improvement through self-play in the limit of infinite computational resources.
arXiv Detail & Related papers (2025-05-02T20:03:35Z)
ChatGPT vs. DeepSeek: A Comparative Study on AI-Based Code Generation [0.0]
This research compares ChatGPT and DeepSeek for Python code generation using online judge coding challenges. It evaluates correctness (online judge verdicts, up to three attempts), code quality (Pylint/Flake8), and efficiency (execution time/memory usage) DeepSeek demonstrated higher correctness, particularly on algorithmic tasks, often achieving 'Accepted' on the first attempt.
arXiv Detail & Related papers (2025-01-30T16:14:48Z)
Effective LLM-Driven Code Generation with Pythoness [0.0]
Pythoness is an embedded domain-specific language for code generation using large language models (LLMs) In Pythoness, developers operate at the level of behavioral specifications when writing functions, classes, or an entire program. We show that Pythoness can successfully leverage a combination of tests and code generation to yield higher quality code than specifications alone.
arXiv Detail & Related papers (2025-01-03T23:14:46Z)
Is This You, LLM? Recognizing AI-written Programs with Multilingual Code Stylometry [3.6936359356095454]
We introduce a technique for AI code stylometry, i.e., the ability to distinguish code generated by LLMs from code written by humans. We also release H-AIRosettaMP, a novel open dataset for AI code stylometry tasks, consisting of 121 247 code snippets in 10 popular programming languages.
arXiv Detail & Related papers (2024-12-19T08:00:20Z)
CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution [50.7413285637879]
The CRUXEVAL-X code reasoning benchmark contains 19 programming languages. It comprises at least 600 subjects for each language, along with 19K content-consistent tests in total. Even a model trained solely on Python can achieve at most 34.4% Pass@1 in other languages.
arXiv Detail & Related papers (2024-08-23T11:43:00Z)
Learning to Reason via Program Generation, Emulation, and Search [33.11955431589091]
Program synthesis with language models (LMs) has unlocked a large set of reasoning abilities. Not all reasoning tasks are easily expressible as code, e.g. tasks involving commonsense reasoning, moral decision-making, and sarcasm understanding. We propose Code Generation and Emulated EXecution (CoGEX) to extend an LM's program synthesis skills to such tasks.
arXiv Detail & Related papers (2024-05-25T19:40:50Z)
Evaluating AI-generated code for C++, Fortran, Go, Java, Julia, Matlab, Python, R, and Rust [0.1906498126334485]
This study evaluates the capabilities of ChatGPT versions 3.5 and 4 in generating code across a diverse range of programming languages. We asked ChatGPT to generate three distinct codes: a simple numerical integration, a conjugate gradient solver, and a parallel 1D stencil-based heat equation solver. The focus of our analysis was on the compilation, runtime performance, and accuracy of the codes.
arXiv Detail & Related papers (2024-05-21T17:04:37Z)
TransformCode: A Contrastive Learning Framework for Code Embedding via Subtree Transformation [9.477734501499274]
We present TransformCode, a novel framework that learns code embeddings in a contrastive learning manner. Our framework is encoder-agnostic and language-agnostic, which means that it can leverage any encoder model and handle any programming language.
arXiv Detail & Related papers (2023-11-10T09:05:23Z)
LILO: Learning Interpretable Libraries by Compressing and Documenting Code [71.55208585024198]
We introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code. LILO combines LLM-guided program synthesis with recent algorithmic advances in automated from Stitch. We find that AutoDoc boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions.
arXiv Detail & Related papers (2023-10-30T17:55:02Z)
Guess & Sketch: Language Model Guided Transpilation [59.02147255276078]
Learned transpilation offers an alternative to manual re-writing and engineering efforts. Probabilistic neural language models (LMs) produce plausible outputs for every input, but do so at the cost of guaranteed correctness. Guess & Sketch extracts alignment and confidence information from features of the LM then passes it to a symbolic solver to resolve semantic equivalence.
arXiv Detail & Related papers (2023-09-25T15:42:18Z)
PAL: Program-aided Language Models [112.94785609781503]
We present Program-Aided Language models (PaL) to understand natural language problems. PaL offloads the solution step to a programmatic runtime such as a Python interpreter. We set new state-of-the-art results in all 12 benchmarks.
arXiv Detail & Related papers (2022-11-18T18:56:13Z)
Repairing Bugs in Python Assignments Using Large Language Models [9.973714032271708]
We propose to use a large language model trained on code to build an APR system for programming assignments. Our system can fix both syntactic and semantic mistakes by combining multi-modal prompts, iterative querying, test-case-based selection of few-shots, and program chunking. We evaluate MMAPR on 286 real student programs and compare to a baseline built by combining a state-of-the-art Python syntax repair engine, BIFI, and state-of-the-art Python semantic repair engine for student assignments, Refactory.
arXiv Detail & Related papers (2022-09-29T15:41:17Z)
Interactive Code Generation via Test-Driven User-Intent Formalization [60.90035204567797]
Large language models (LLMs) produce code from informal natural language (NL) intent. It is hard to define a notion of correctness since natural language can be ambiguous and lacks a formal semantics. We describe a language-agnostic abstract algorithm and a concrete implementation TiCoder.
arXiv Detail & Related papers (2022-08-11T17:41:08Z)
Language Models Can Teach Themselves to Program Better [4.627023679353507]
Recent Language Models (LMs) achieve breakthrough performance in code generation when trained on human-authored problems. We show that it is possible for an LM to synthesize programming problems and solutions, which are filtered for correctness by a Python interpreter. The LM's performance is then seen to improve when it is fine-tuned on its own synthetic problems and verified solutions.
arXiv Detail & Related papers (2022-07-29T06:43:28Z)
ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval. We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.