Related papers: Can Large Language Models Solve Path Constraints in Symbolic Execution?

Can Large Language Models Solve Path Constraints in Symbolic Execution?

URL: http://arxiv.org/abs/2511.18288v1
Date: Sun, 23 Nov 2025 04:54:48 GMT
Title: Can Large Language Models Solve Path Constraints in Symbolic Execution?
Authors: Wenhan Wang, Kaibo Liu, Zeyu Sun, An Ran Chen, Ge Li, Gang Huang, Lei Ma,
Abstract summary: We focus on investigating the possibility of adopting large language models (LLM) for path constraint solving.<n>We build new evaluation pipelines and benchmarks for two tasks: test case generation and path classification.<n>Our experiment results show that state-of-the-art LLMs are able to solve path constraints in both generation and classification tasks.
Score: 32.45630887872447
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Symbolic execution is an important software analysis technique which benefits downstream tasks such as software testing and debugging. However, several limitations hinder symbolic execution from application on real-world software. One of the limitations is the inability to solve diverse execution path constraints: traditional symbolic execution based on SMT solvers is difficult to handle execution paths with complex data structures or external API calls. In this paper, we focus on investigating the possibility of adopting large language models (LLM) for path constraint solving instead of traditional solver-based techniques in symbolic execution. We conduct an empirical study to evaluate the ability of LLMs in two types of path constraint solving: generating test inputs to facilitate an execution path, and determining whether a given execution path can be satisfied without triggering any bugs. We build new evaluation pipelines and benchmarks for two tasks: test case generation and path classification, which include data sources from both competition-level programs and real-world repositories. Our experiment results show that state-of-the-art LLMs are able to solve path constraints in both generation and classification tasks, with 60% of generated test cases that accurately cover the given execution path. Moreover, LLMs are capable of improving test coverage by covering execution paths in real-world repositories where traditional symbolic execution tools cannot be applied. These findings highlight the possibility of extending symbolic execution techniques with LLMs in the future to improve the ability and generalizability of symbolic execution.

Related papers

Environment-Aware Code Generation: How far are We? [52.69113158357018]
It is unclear whether large language models (LLMs) can reliably generate executable code tailored to a user's specific environment.<n>We present the first systematic study of Environment-Aware Code Generation (EACG), where generated code must be functionally correct and directly executable under arbitrary software configurations.<n>Our results show that current LLMs struggle with environment-specific code generation, while our adaptations improve environment compatibility and executability.
arXiv Detail & Related papers (2026-01-18T04:58:15Z)
Rethinking Testing for LLM Applications: Characteristics, Challenges, and a Lightweight Interaction Protocol [83.83217247686402]
Large Language Models (LLMs) have evolved from simple text generators into complex software systems that integrate retrieval augmentation, tool invocation, and multi-turn interactions.<n>Their inherent non-determinism, dynamism, and context dependence pose fundamental challenges for quality assurance.<n>This paper decomposes LLM applications into a three-layer architecture: textbftextitSystem Shell Layer, textbftextitPrompt Orchestration Layer, and textbftextitLLM Inference Core.
arXiv Detail & Related papers (2025-08-28T13:00:28Z)
Symbolic Execution in Practice: A Survey of Applications in Vulnerability, Malware, Firmware, and Protocol Analysis [3.1844358655583846]
Symbolic execution is a powerful program analysis technique that allows for the systematic exploration of all program paths.<n>This paper introduces a systematic taxonomy of strategies to enable symbolic execution on complex software systems.<n>We survey applications of symbolic executions in several domains such as vulnerability analysis, malware analysis, firmware re-hosting, and network protocol analysis.
arXiv Detail & Related papers (2025-08-08T18:43:45Z)
Generating and Understanding Tests via Path-Aware Symbolic Execution with LLMs [8.828992823055]
PALM is a test generation system that combines symbolic path enumeration with LLM-assisted test generation.<n>Palm helps users better understand path coverage and identify which paths are actually exercised by PALM-generated tests.
arXiv Detail & Related papers (2025-06-24T03:46:16Z)
Beyond Next Token Probabilities: Learnable, Fast Detection of Hallucinations and Data Contamination on LLM Output Distributions [60.43398881149664]
We introduce LOS-Net, a lightweight attention-based architecture trained on an efficient encoding of the LLM Output Signature.<n>It achieves superior performance across diverse benchmarks and LLMs, while maintaining extremely low detection latency.
arXiv Detail & Related papers (2025-03-18T09:04:37Z)
Test Wars: A Comparative Study of SBST, Symbolic Execution, and LLM-Based Approaches to Unit Test Generation [11.037212298533069]
Large Language Models (LLMs) have opened up new opportunities to generate tests automatically.<n>This paper studies automatic test generation approaches based on three tools: EvoSuite for SBST, Kex for symbolic execution, and TestSpark for LLM-based test generation.<n>Our results show that while LLM-based test generation is promising, it falls behind traditional methods in terms of coverage.
arXiv Detail & Related papers (2025-01-17T13:48:32Z)
Test-driven Software Experimentation with LASSO: an LLM Prompt Benchmarking Example [1.4685355149711299]
Test-Driven Software Experiments ( TDSEs) involve the execution of software subjects and the observation and analysis of their "de facto" run-time behavior.<n>We present a general-purpose analysis platform called LASSO that provides a minimal set of domain-specific languages and data structures to conduct TDSEs.
arXiv Detail & Related papers (2024-10-11T15:32:48Z)
Python Symbolic Execution with LLM-powered Code Generation [40.906079949304726]
Symbolic execution is a key technology in software testing, which generates test cases by collecting symbolic path constraints. Symbolic execution has been proven helpful in generating high-coverage test cases, but its limitations prevent it from broader usage in software testing. We propose an agent, LLM-Sym, that automatically calls an SMT solver, Z3, to solve execution path constraints.
arXiv Detail & Related papers (2024-09-14T02:43:20Z)
DOCE: Finding the Sweet Spot for Execution-Based Code Generation [69.5305729627198]
We propose a comprehensive framework that includes candidate generation, $n$-best reranking, minimum Bayes risk (MBR) decoding, and self-ging as the core components. Our findings highlight the importance of execution-based methods and the difference gap between execution-based and execution-free methods.
arXiv Detail & Related papers (2024-08-25T07:10:36Z)
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning. Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.