Can Large Language Models Understand Intermediate Representations in Compilers?
- URL: http://arxiv.org/abs/2502.06854v2
- Date: Thu, 05 Jun 2025 15:48:54 GMT
- Title: Can Large Language Models Understand Intermediate Representations in Compilers?
- Authors: Hailong Jiang, Jianfeng Zhu, Yao Wan, Bo Fang, Hongyu Zhang, Ruoming Jin, Qiang Guan,
- Abstract summary: Intermediate Representations (IRs) play a critical role in compiler design and program analysis.<n>We evaluate the capabilities of six state-of-the-art Large Language Models (LLMs) in understanding IRs.
- Score: 17.033963652676164
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Intermediate Representations (IRs) play a critical role in compiler design and program analysis, yet their comprehension by Large Language Models (LLMs) remains underexplored. In this paper, we present an explorative empirical study evaluating the capabilities of six state-of-the-art LLMs: GPT-4, GPT-3, DeepSeek, Gemma 2, Llama 3, and Code Llama, in understanding IRs. Specifically, we assess model performance across four core tasks: control flow graph reconstruction, decompilation, code summarization, and execution reasoning. While LLMs exhibit competence in parsing IR syntax and identifying high-level structures, they consistently struggle with instruction-level reasoning, especially in control flow reasoning, loop handling, and dynamic execution. Common failure modes include misinterpreting branching instructions, omitting critical operations, and relying on heuristic reasoning rather than precise instruction-level logic. Our findings highlight the need for IR-specific enhancements in LLM design. We recommend fine-tuning on structured IR datasets and integrating control-flow-sensitive architectures to improve model effectiveness. All experimental data and source code are publicly available at
Related papers
- Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering [75.12322966980003]
Large Language Models (LLMs) have shown strong inductive reasoning ability across various domains.<n>Most existing RAG pipelines rely on unstructured text, limiting interpretability and structured reasoning.<n>Recent studies have explored integrating knowledge graphs with LLMs for knowledge graph question answering.<n>We propose RAPL, a novel framework for efficient and effective graph retrieval in KGQA.
arXiv Detail & Related papers (2025-06-11T12:03:52Z) - CoT-RAG: Integrating Chain of Thought and Retrieval-Augmented Generation to Enhance Reasoning in Large Language Models [14.784841713647682]
Chain-of-thought (CoT) reasoning boosts large language models' (LLMs) performance on complex tasks.<n>We propose CoT-RAG, a novel reasoning framework with three key designs.<n>We show significant accuracy gains--ranging from 4.0% to 44.3%--over state-of-the-art methods.
arXiv Detail & Related papers (2025-04-18T07:55:09Z) - How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective [64.00022624183781]
Large language models (LLMs) can assess relevance and support information retrieval (IR) tasks.
We investigate how different LLM modules contribute to relevance judgment through the lens of mechanistic interpretability.
arXiv Detail & Related papers (2025-04-10T16:14:55Z) - RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement [85.08223786819532]
Existing large language models (LLMs) show exceptional problem-solving capabilities but might struggle with complex reasoning tasks.
We propose textbfRAG-Star, a novel RAG approach that integrates retrieved information to guide the tree-based deliberative reasoning process.
Our experiments involving Llama-3.1-8B-Instruct and GPT-4o demonstrate that RAG-Star significantly outperforms previous RAG and reasoning methods.
arXiv Detail & Related papers (2024-12-17T13:05:36Z) - Benchmarking and Understanding Compositional Relational Reasoning of LLMs [1.915591735124465]
We first propose a new synthetic benchmark called Generalized Associative Recall (GAR)
Evaluation shows that GAR is challenging enough for existing LLMs, revealing their fundamental deficiency in CRR.
We then use attribution patching to discover the core circuits reused by Vicuna-33B across different tasks and a set of vital attention heads.
arXiv Detail & Related papers (2024-12-17T12:10:38Z) - Eliciting Critical Reasoning in Retrieval-Augmented Language Models via Contrastive Explanations [4.697267141773321]
Retrieval-augmented generation (RAG) has emerged as a critical mechanism in contemporary NLP to support Large Language Models (LLMs) in systematically accessing richer factual context.
Recent studies have shown that LLMs still struggle to critically analyse RAG-based in-context information, a limitation that may lead to incorrect inferences and hallucinations.
In this paper, we investigate how to elicit critical reasoning in RAG via contrastive explanations.
arXiv Detail & Related papers (2024-10-30T10:11:53Z) - Enhancing LLM's Cognition via Structurization [41.13997892843677]
Large language models (LLMs) process input contexts through a causal and sequential perspective.
This paper presents a novel concept of context structurization.
Specifically, we transform the plain, unordered contextual sentences into well-ordered and hierarchically structurized elements.
arXiv Detail & Related papers (2024-07-23T12:33:58Z) - What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions.
We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types.
We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z) - Towards Learning Abductive Reasoning using VSA Distributed Representations [56.31867341825068]
We introduce the Abductive Rule Learner with Context-awareness (ARLC) model.
ARLC features a novel and more broadly applicable training objective for abductive reasoning.
We show ARLC's robustness to post-programming training by incrementally learning from examples on top of programmed knowledge.
arXiv Detail & Related papers (2024-06-27T12:05:55Z) - Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning [89.89857766491475]
We propose a curriculum-based logical-aware instruction tuning framework, named LACT.<n>Specifically, we augment the arbitrary first-order logical queries via binary tree decomposition.<n> Experiments across widely used datasets demonstrate that LACT has substantial improvements(brings an average +5.5% MRR score) over advanced methods, achieving the new state-of-the-art.
arXiv Detail & Related papers (2024-05-02T18:12:08Z) - Self-Retrieval: End-to-End Information Retrieval with One Large Language Model [97.71181484082663]
We introduce Self-Retrieval, a novel end-to-end LLM-driven information retrieval architecture.
Self-Retrieval internalizes the retrieval corpus through self-supervised learning, transforms the retrieval process into sequential passage generation, and performs relevance assessment for reranking.
arXiv Detail & Related papers (2024-02-23T18:45:35Z) - INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning [59.07490387145391]
Large language models (LLMs) have demonstrated impressive capabilities in various natural language processing tasks.
Their application to information retrieval (IR) tasks is still challenging due to the infrequent occurrence of many IR-specific concepts in natural language.
We introduce a novel instruction tuning dataset, INTERS, encompassing 20 tasks across three fundamental IR categories.
arXiv Detail & Related papers (2024-01-12T12:10:28Z) - FAIR: Flow Type-Aware Pre-Training of Compiler Intermediate
Representations [36.030609139210426]
We propose a Flow type-Aware pre-trained model for compiler intermediate representations (IRs)
We specifically propose to enable FAIR to learn the semantics of IR tokens, flow type information, and the overall representation of IR.
Experimental results show that FAIR can achieve state-of-the-art results on four code-related downstream tasks.
arXiv Detail & Related papers (2023-09-09T15:51:49Z) - When Do Program-of-Thoughts Work for Reasoning? [51.2699797837818]
We propose complexity-impacted reasoning score (CIRS) to measure correlation between code and reasoning abilities.
Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity.
Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
arXiv Detail & Related papers (2023-08-29T17:22:39Z) - LMs: Understanding Code Syntax and Semantics for Code Analysis [25.508254718438636]
We evaluate the capabilities of large language models (LLMs) and their limitations for code analysis in software engineering.
We employ four state-of-the-art foundational models, GPT4, GPT3.5, StarCoder and CodeLlama-13b-instruct.
arXiv Detail & Related papers (2023-05-20T08:43:49Z) - Search-in-the-Chain: Interactively Enhancing Large Language Models with
Search for Knowledge-intensive Tasks [121.74957524305283]
This paper proposes a novel framework named textbfSearch-in-the-Chain (SearChain) for the interaction between Information Retrieval (IR) and Large Language Model (LLM)
Experiments show that SearChain outperforms state-of-the-art baselines on complex knowledge-intensive tasks.
arXiv Detail & Related papers (2023-04-28T10:15:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.