Do Code LLMs Do Static Analysis?
- URL: http://arxiv.org/abs/2505.12118v1
- Date: Sat, 17 May 2025 18:55:40 GMT
- Title: Do Code LLMs Do Static Analysis?
- Authors: Chia-Yi Su, Collin McMillan,
- Abstract summary: This paper investigates code LLMs' capability of static analysis during code intelligence tasks such as code summarization and generation.<n>We use three different static analysis tasks (callgraph generation, AST generation, and dataflow generation) and three different code intelligence tasks (code generation, summarization, and translation) as our experiments.<n>We found that LLMs show poor performance on static analysis tasks and that pretraining on the static analysis tasks does not generalize to better performance on the code intelligence tasks.
- Score: 2.4401219403555814
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates code LLMs' capability of static analysis during code intelligence tasks such as code summarization and generation. Code LLMs are now household names for their abilities to do some programming tasks that have heretofore required people. The process that people follow to do programming tasks has long been understood to require static analysis. For example, human programmers navigate the call graph of large programs to comprehend the different parts of those programs. Education in programming includes static analysis under the assumption that better static analysis skills beget better programming. Yet while popular culture is replete with anthropomorphic references such as LLM "reasoning", in fact code LLMs could exhibit a wholly alien thought process to humans. This paper studies the specific question of static analysis by code LLMs. We use three different static analysis tasks (callgraph generation, AST generation, and dataflow generation) and three different code intelligence tasks (code generation, summarization, and translation) with two different open-source models (Gemini and GPT-4o) and closed-source models (CodeLlaMA and Jam) as our experiments. We found that LLMs show poor performance on static analysis tasks and that pretraining on the static analysis tasks does not generalize to better performance on the code intelligence tasks.
Related papers
- Can Large Language Models Understand Symbolic Graphics Programs? [136.5639211254501]
Symbolic graphics programs are popular in computer graphics.<n>They allow us to test an LLM's ability to answer semantic questions about the images or 3D geometries without a vision encoder.<n>We create a benchmark for the semantic visual understanding of symbolic graphics programs, built procedurally with minimal human effort.<n>We evaluate commercial and open-source LLMs on our benchmark to assess their ability to reason about visual output of programs.
arXiv Detail & Related papers (2024-08-15T17:59:57Z) - What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions.
We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types.
We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z) - Perplexed: Understanding When Large Language Models are Confused [3.4208414448496027]
This paper introduces perplexed, a library for exploring where a language model is perplexed.
We conducted a case study focused on Large Language Models (LLMs) for code generation using an additional tool we built to help with the analysis of code models called codetokenizer.
We found that our studied code LLMs had their worst performance on coding structures where the code was not syntactically correct.
arXiv Detail & Related papers (2024-04-09T22:03:39Z) - LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges.
Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model.
This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z) - Efficient Tool Use with Chain-of-Abstraction Reasoning [63.08202389132155]
Large language models (LLMs) need to ground their reasoning to real-world knowledge.<n>There remains challenges for fine-tuning LLM agents to invoke tools in multi-step reasoning problems.<n>We propose a new method for LLMs to better leverage tools in multi-step reasoning.
arXiv Detail & Related papers (2024-01-30T21:53:30Z) - If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code
Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code)
Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z) - Large Language Models for Code Analysis: Do LLMs Really Do Their Job? [13.48555476110316]
Large language models (LLMs) have demonstrated significant potential in the realm of natural language understanding and programming code processing tasks.
This paper offers a comprehensive evaluation of LLMs' capabilities in performing code analysis tasks.
arXiv Detail & Related papers (2023-10-18T22:02:43Z) - How Does Naming Affect LLMs on Code Analysis Tasks? [8.150719423943109]
Large Language Models (LLMs) were proposed for natural language processing (NLP) and have shown promising results as general-purpose language models.
This paper investigates how naming affects LLMs on code analysis tasks.
arXiv Detail & Related papers (2023-07-24T02:38:24Z) - Do Large Language Models Pay Similar Attention Like Human Programmers When Generating Code? [10.249771123421432]
We investigate whether Large Language Models (LLMs) attend to the same parts of a task description as human programmers during code generation.
We manually analyzed 211 incorrect code snippets and found five attention patterns that can be used to explain many code generation errors.
Our findings highlight the need for human-aligned LLMs for better interpretability and programmer trust.
arXiv Detail & Related papers (2023-06-02T00:57:03Z) - LMs: Understanding Code Syntax and Semantics for Code Analysis [25.508254718438636]
We evaluate the capabilities of large language models (LLMs) and their limitations for code analysis in software engineering.
We employ four state-of-the-art foundational models, GPT4, GPT3.5, StarCoder and CodeLlama-13b-instruct.
arXiv Detail & Related papers (2023-05-20T08:43:49Z) - Can Large Language Models Transform Computational Social Science? [79.62471267510963]
Large Language Models (LLMs) are capable of performing many language processing tasks zero-shot (without training data)
This work provides a road map for using LLMs as Computational Social Science tools.
arXiv Detail & Related papers (2023-04-12T17:33:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.