If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code
Empowers Large Language Models to Serve as Intelligent Agents
- URL: http://arxiv.org/abs/2401.00812v2
- Date: Mon, 8 Jan 2024 16:22:42 GMT
- Title: If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code
Empowers Large Language Models to Serve as Intelligent Agents
- Authors: Ke Yang, Jiateng Liu, John Wu, Chaoqi Yang, Yi R. Fung, Sha Li, Zixuan
Huang, Xu Cao, Xingyao Wang, Yiquan Wang, Heng Ji, Chengxiang Zhai
- Abstract summary: Large language models (LLMs) are trained on a combination of natural language and formal language (code)
Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
- Score: 81.60906807941188
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The prominent large language models (LLMs) of today differ from past language
models not only in size, but also in the fact that they are trained on a
combination of natural language and formal language (code). As a medium between
humans and computers, code translates high-level goals into executable steps,
featuring standard syntax, logical consistency, abstraction, and modularity. In
this survey, we present an overview of the various benefits of integrating code
into LLMs' training data. Specifically, beyond enhancing LLMs in code
generation, we observe that these unique properties of code help (i) unlock the
reasoning ability of LLMs, enabling their applications to a range of more
complex natural language tasks; (ii) steer LLMs to produce structured and
precise intermediate steps, which can then be connected to external execution
ends through function calls; and (iii) take advantage of code compilation and
execution environment, which also provides diverse feedback for model
improvement. In addition, we trace how these profound capabilities of LLMs,
brought by code, have led to their emergence as intelligent agents (IAs) in
situations where the ability to understand instructions, decompose goals, plan
and execute actions, and refine from feedback are crucial to their success on
downstream tasks. Finally, we present several key challenges and future
directions of empowering LLMs with code.
Related papers
- Source Code Summarization in the Era of Large Language Models [23.715005053430957]
Large language models (LLMs) have led to a great boost in the performance of code-related tasks.
In this paper, we undertake a systematic and comprehensive study on code summarization in the era of LLMs.
arXiv Detail & Related papers (2024-07-09T05:48:42Z) - An Empirical Study on Capability of Large Language Models in Understanding Code Semantics [4.638578225024275]
Large Language Models for Code (code LLMs) have demonstrated remarkable performance across various software engineering (SE) tasks.
This paper introduces EMPICA, a framework designed to evaluate the capabilities of code LLMs in understanding code semantics.
arXiv Detail & Related papers (2024-07-04T03:40:58Z) - Large Language Models are Interpretable Learners [53.56735770834617]
In this paper, we show a combination of Large Language Models (LLMs) and symbolic programs can bridge the gap between expressiveness and interpretability.
The pretrained LLM with natural language prompts provides a massive set of interpretable modules that can transform raw input into natural language concepts.
As the knowledge learned by LSP is a combination of natural language descriptions and symbolic rules, it is easily transferable to humans (interpretable) and other LLMs.
arXiv Detail & Related papers (2024-06-25T02:18:15Z) - Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks.
LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning.
We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z) - Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing [56.75702900542643]
We introduce AlphaLLM for the self-improvements of Large Language Models.
It integrates Monte Carlo Tree Search (MCTS) with LLMs to establish a self-improving loop.
Our experimental results show that AlphaLLM significantly enhances the performance of LLMs without additional annotations.
arXiv Detail & Related papers (2024-04-18T15:21:34Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language
Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs)
Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z) - CodeApex: A Bilingual Programming Evaluation Benchmark for Large
Language Models [43.655927559990616]
We propose CodeApex, a benchmark dataset focusing on the programming comprehension, code generation, and code correction abilities of LLMs.
We evaluate 12 widely used LLMs, including both general-purpose and specialized models.
GPT-4 exhibits the best programming capabilities, achieving approximate accuracy of 69%, 54%, and 66% on the three tasks, respectively.
arXiv Detail & Related papers (2023-09-05T04:12:01Z) - The potential of LLMs for coding with low-resource and domain-specific
programming languages [0.0]
This study focuses on the econometric scripting language named hansl of the open-source software gretl.
Our findings suggest that LLMs can be a useful tool for writing, understanding, improving, and documenting gretl code.
arXiv Detail & Related papers (2023-07-24T17:17:13Z) - Large Language Models are Few-Shot Summarizers: Multi-Intent Comment
Generation via In-Context Learning [34.006227676170504]
This study investigates the feasibility of utilizing large language models (LLMs) to generate comments that can fulfill developers' diverse intents.
Experiments on two large-scale datasets demonstrate the rationale of our insights.
arXiv Detail & Related papers (2023-04-22T12:26:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.