Code Simulation Challenges for Large Language Models
- URL: http://arxiv.org/abs/2401.09074v4
- Date: Wed, 12 Jun 2024 08:55:13 GMT
- Title: Code Simulation Challenges for Large Language Models
- Authors: Emanuele La Malfa, Christoph Weinhuber, Orazio Torre, Fangru Lin, Samuele Marro, Anthony Cohn, Nigel Shadbolt, Michael Wooldridge,
- Abstract summary: This work studies to what extent Large Language Models (LLMs) can simulate coding and algorithmic tasks.
We introduce benchmarks for straight-line programs, code that contains critical paths, and approximate and redundant instructions.
We propose a novel off-the-shelf prompting method, Chain of Simulation (CoSm), which instructs LLMs to simulate code execution line by line/follow the pattern of compilers.
- Score: 6.970495767499435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many reasoning, planning, and problem-solving tasks share an intrinsic algorithmic nature: correctly simulating each step is a sufficient condition to solve them correctly. This work studies to what extent Large Language Models (LLMs) can simulate coding and algorithmic tasks to provide insights into general capabilities in such algorithmic reasoning tasks. We introduce benchmarks for straight-line programs, code that contains critical paths, and approximate and redundant instructions. We further assess the simulation capabilities of LLMs with sorting algorithms and nested loops and show that a routine's computational complexity directly affects an LLM's ability to simulate its execution. While the most powerful LLMs exhibit relatively strong simulation capabilities, the process is fragile, seems to rely heavily on pattern recognition, and is affected by memorisation. We propose a novel off-the-shelf prompting method, Chain of Simulation (CoSm), which instructs LLMs to simulate code execution line by line/follow the computation pattern of compilers. CoSm efficiently helps LLMs reduce memorisation and shallow pattern recognition while improving simulation performance. We consider the success of CoSm in code simulation to be inspirational for other general routine simulation reasoning tasks.
Related papers
- Code Simulation as a Proxy for High-order Tasks in Large Language Models [6.71786454125056]
We collect pairs of naturalistic and synthetic reasoning tasks to assess the capabilities of Large Language Models (LLM)
We leverage common constructs in programming as the counterpart of the building blocks of naturalistic reasoning tasks.
Our contribution builds upon synthetically testing the reasoning capabilities of LLMs as a scalable complement to handcrafted human-annotated problems.
arXiv Detail & Related papers (2025-02-05T19:30:28Z) - Simulation Streams: A Programming Paradigm for Controlling Large Language Models and Building Complex Systems with Generative AI [3.3126968968429407]
Simulation Streams is a programming paradigm designed to efficiently control and leverage Large Language Models (LLMs)
Our primary goal is to create a framework that harnesses the agentic abilities of LLMs while addressing their limitations in maintaining consistency.
arXiv Detail & Related papers (2025-01-30T16:38:03Z) - Pseudocode-Injection Magic: Enabling LLMs to Tackle Graph Computational Tasks [15.69049038121735]
Graph computational tasks are inherently challenging and often demand advanced algorithms for effective solutions.
Existing approaches are constrained by large language models' limited capability to comprehend complex graph structures.
We introduce a novel framework, PIE, which consists of three key steps: problem understanding, prompt design, and code generation.
arXiv Detail & Related papers (2025-01-23T15:04:22Z) - Interactive and Expressive Code-Augmented Planning with Large Language Models [62.799579304821826]
Large Language Models (LLMs) demonstrate strong abilities in common-sense reasoning and interactive decision-making.
Recent techniques have sought to structure LLM outputs using control flow and other code-adjacent techniques to improve planning performance.
We propose REPL-Plan, an LLM planning approach that is fully code-expressive and dynamic.
arXiv Detail & Related papers (2024-11-21T04:23:17Z) - LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale [17.00936774784349]
There is a lack of simulation infrastructure capable of accurately modeling versatile hardware-software behaviors in large language model (LLM) serving systems.
This paper aims to develop an effective simulation tool, called LLMServingSim, to support future research in LLM serving systems.
arXiv Detail & Related papers (2024-08-10T09:26:15Z) - On the Design and Analysis of LLM-Based Algorithms [74.7126776018275]
Large language models (LLMs) are used as sub-routines in algorithms.
LLMs have achieved remarkable empirical success.
Our proposed framework holds promise for advancing LLM-based algorithms.
arXiv Detail & Related papers (2024-07-20T07:39:07Z) - Case2Code: Scalable Synthetic Data for Code Generation [105.89741089673575]
Large Language Models (LLMs) have shown outstanding breakthroughs in code generation.
Recent work improves code LLMs by training on synthetic data generated by some powerful LLMs.
We propose a textbfCase2Code task by exploiting the expressiveness and correctness of programs.
arXiv Detail & Related papers (2024-07-17T11:35:00Z) - Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks.
LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning.
We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z) - Can Language Models Pretend Solvers? Logic Code Simulation with LLMs [3.802945676202634]
Transformer-based large language models (LLMs) have demonstrated significant potential in addressing logic problems.
This study delves into a novel aspect, namely logic code simulation, which forces LLMs to emulate logical solvers in predicting the results of logical programs.
arXiv Detail & Related papers (2024-03-24T11:27:16Z) - When Do Program-of-Thoughts Work for Reasoning? [51.2699797837818]
We propose complexity-impacted reasoning score (CIRS) to measure correlation between code and reasoning abilities.
Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity.
Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
arXiv Detail & Related papers (2023-08-29T17:22:39Z) - SatLM: Satisfiability-Aided Language Models Using Declarative Prompting [68.40726892904286]
We propose a new satisfiability-aided language modeling (SatLM) approach for improving the reasoning capabilities of large language models (LLMs)
We use an LLM to generate a declarative task specification rather than an imperative program and leverage an off-the-shelf automated theorem prover to derive the final answer.
We evaluate SATLM on 8 different datasets and show that it consistently outperforms program-aided LMs in the imperative paradigm.
arXiv Detail & Related papers (2023-05-16T17:55:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.