Validating Solidity Code Defects using Symbolic and Concrete Execution powered by Large Language Models
- URL: http://arxiv.org/abs/2509.13023v1
- Date: Tue, 16 Sep 2025 12:46:11 GMT
- Title: Validating Solidity Code Defects using Symbolic and Concrete Execution powered by Large Language Models
- Authors: Ştefan-Claudiu Susan, Andrei Arusoaie, Dorel Lucanu,
- Abstract summary: This paper introduces a novel detection pipeline that integrates custom Slither-based detectors, Large Language Models (LLMs), Kontrol, and Forge.<n>Our approach is designed to reliably detect defects and generate proofs.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The high rate of false alarms from static analysis tools and Large Language Models (LLMs) complicates vulnerability detection in Solidity Smart Contracts, demanding methods that can formally or empirically prove the presence of defects. This paper introduces a novel detection pipeline that integrates custom Slither-based detectors, LLMs, Kontrol, and Forge. Our approach is designed to reliably detect defects and generate proofs. We currently perform experiments with promising results for seven types of critical defects. We demonstrate the pipeline's efficacy by presenting our findings for three vulnerabilities -- Reentrancy, Complex Fallback, and Faulty Access Control Policies -- that are challenging for current verification solutions, which often generate false alarms or fail to detect them entirely. We highlight the potential of either symbolic or concrete execution in correctly classifying such code faults. By chaining these instruments, our method effectively validates true positives, significantly reducing the manual verification burden. Although we identify potential limitations, such as the inconsistency and the cost of LLMs, our findings establish a robust framework for combining heuristic analysis with formal verification to achieve more reliable and automated smart contract auditing.
Related papers
- Scaling Agentic Verifier for Competitive Coding [66.11758166379092]
Large language models (LLMs) have demonstrated strong coding capabilities but still struggle to solve competitive programming problems correctly in a single attempt.<n>Execution-based re-ranking offers a promising test-time scaling strategy, yet existing methods are constrained by either difficult test case generation or inefficient random input sampling.<n>We propose Agentic Verifier, an execution-based agent that actively reasons about program behaviors and searches for highly discriminative test inputs.
arXiv Detail & Related papers (2026-02-04T06:30:40Z) - LLMs as verification oracles for Solidity [1.3887048755037537]
This paper provides the first systematic evaluation of GPT-5, a state-of-the-art reasoning LLM, in this role.<n>We benchmark its performance on a large dataset of verification tasks, compare its outputs against those of established formal verification tools, and assess its practical effectiveness in real-world auditing scenarios.<n>Our study suggests a new frontier in the convergence of AI and formal methods for secure smart contract development and auditing.
arXiv Detail & Related papers (2025-09-23T15:32:13Z) - Variation in Verification: Understanding Verification Dynamics in Large Language Models [43.829778623942275]
We study generative verifiers, which perform verification by generating chain-of-thought reasoning followed by a binary verdict.<n>Our experiments reveal three key findings about verification effectiveness.
arXiv Detail & Related papers (2025-09-22T16:36:56Z) - Ensembling Large Language Models for Code Vulnerability Detection: An Empirical Evaluation [69.8237598448941]
This study investigates the potential of ensemble learning to enhance the performance of Large Language Models (LLMs) in source code vulnerability detection.<n>We propose Dynamic Gated Stacking (DGS), a Stacking variant tailored for vulnerability detection.
arXiv Detail & Related papers (2025-09-16T03:48:22Z) - VulAgent: Hypothesis-Validation based Multi-Agent Vulnerability Detection [55.957275374847484]
VulAgent is a multi-agent vulnerability detection framework based on hypothesis validation.<n>It implements a semantics-sensitive, multi-view detection pipeline, each aligned to a specific analysis perspective.<n>On average, VulAgent improves overall accuracy by 6.6%, increases the correct identification rate of vulnerable--fixed code pairs by up to 450%, and reduces the false positive rate by about 36%.
arXiv Detail & Related papers (2025-09-15T02:25:38Z) - Reactive Bottom-Up Testing [15.280664862119565]
We introduce a new paradigm that we call Reactive Bottom-Up Testing.<n>Our insight is that function-level testing is necessary but not sufficient for the validation of vulnerabilities in functions.<n>We develop a three-stage bottom-up testing scheme that identifies likely-vulnerable functions and generates type- and context-aware harnesses.
arXiv Detail & Related papers (2025-09-03T20:54:43Z) - TrustLoRA: Low-Rank Adaptation for Failure Detection under Out-of-distribution Data [62.22804234013273]
We propose a simple failure detection framework to unify and facilitate classification with rejection under both covariate and semantic shifts.<n>Our key insight is that by separating and consolidating failure-specific reliability knowledge with low-rank adapters, we can enhance the failure detection ability effectively and flexibly.
arXiv Detail & Related papers (2025-04-20T09:20:55Z) - Towards Copyright Protection for Knowledge Bases of Retrieval-augmented Language Models via Reasoning [58.57194301645823]
Large language models (LLMs) are increasingly integrated into real-world personalized applications.<n>The valuable and often proprietary nature of the knowledge bases used in RAG introduces the risk of unauthorized usage by adversaries.<n>Existing methods that can be generalized as watermarking techniques to protect these knowledge bases typically involve poisoning or backdoor attacks.<n>We propose name for harmless' copyright protection of knowledge bases.
arXiv Detail & Related papers (2025-02-10T09:15:56Z) - Combining GPT and Code-Based Similarity Checking for Effective Smart Contract Vulnerability Detection [0.0]
We present SimilarGPT, a vulnerability identification tool for smart contract.<n>The main concept of SimilarGPT is to measure the similarity between the code under inspection and the secure code from third-party libraries.<n>We propose optimizing the detection sequence using topological ordering to enhance logical coherence and reduce false positives during detection.
arXiv Detail & Related papers (2024-12-24T07:15:48Z) - Token-Level Adversarial Prompt Detection Based on Perplexity Measures
and Contextual Information [67.78183175605761]
Large Language Models are susceptible to adversarial prompt attacks.
This vulnerability underscores a significant concern regarding the robustness and reliability of LLMs.
We introduce a novel approach to detecting adversarial prompts at a token level.
arXiv Detail & Related papers (2023-11-20T03:17:21Z) - Large Language Model-Powered Smart Contract Vulnerability Detection: New
Perspectives [8.524720028421447]
This paper provides a systematic analysis of the opportunities, challenges, and potential solutions of harnessing Large Language Models (LLMs) such as GPT-4.
generating more answers with higher randomness largely boosts the likelihood of producing a correct answer but inevitably leads to a higher number of false positives.
We propose an adversarial framework dubbed GPTLens that breaks the conventional one-stage detection into two synergistic stages $-$ generation and discrimination.
arXiv Detail & Related papers (2023-10-02T12:37:23Z) - SAT-based Formal Fault-Resistance Verification of Cryptographic Circuits [4.42563968195381]
This paper formalizes the fault-resistance verification problem which is shown to be NP-complete.
We then devise a novel approach for encoding the fault-resistance verification problem as the Boolean satisfiability (SAT) problem.
The approach is implemented in an open-source tool FIRMER which is evaluated extensively on realistic cryptographic circuit benchmarks.
arXiv Detail & Related papers (2023-07-02T13:01:32Z) - No Need to Know Physics: Resilience of Process-based Model-free Anomaly
Detection for Industrial Control Systems [95.54151664013011]
We present a novel framework to generate adversarial spoofing signals that violate physical properties of the system.
We analyze four anomaly detectors published at top security conferences.
arXiv Detail & Related papers (2020-12-07T11:02:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.