Related papers: Hybrid Concolic Testing with Large Language Models for Guided Path Exploration

Hybrid Concolic Testing with Large Language Models for Guided Path Exploration

URL: http://arxiv.org/abs/2601.12274v1
Date: Sun, 18 Jan 2026 06:09:18 GMT
Title: Hybrid Concolic Testing with Large Language Models for Guided Path Exploration
Authors: Mahdi Eslamimehr,
Abstract summary: Concolic testing, a powerful hybrid software testing technique, has historically been plagued by fundamental limitations.<n>This paper introduces a novel algorithmic framework that integrates concolic execution with Large Language Models (LLMs) to overcome these challenges.
Score: 0.152292571922932
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Concolic testing, a powerful hybrid software testing technique, has historically been plagued by fundamental limitations such as path explosion and the high cost of constraint solving, which hinder its practical application in large-scale, real-world software systems. This paper introduces a novel algorithmic framework that synergistically integrates concolic execution with Large Language Models (LLMs) to overcome these challenges. Our hybrid approach leverages the semantic reasoning capabilities of LLMs to guide path exploration, prioritize interesting execution paths, and assist in constraint solving. We formally define the system architecture and algorithms that constitute this new paradigm. Through a series of experiments on both synthetic and real-world Fintech applications, we demonstrate that our approach significantly outperforms traditional concolic testing, random testing, and genetic algorithm-based methods in terms of branch coverage, path coverage, and time-to-coverage. The results indicate that by combining the strengths of both concolic execution and LLMs, our method achieves a more efficient and effective exploration of the program state space, leading to improved bug detection capabilities.

Related papers

Interaction-Grounded Learning for Contextual Markov Decision Processes with Personalized Feedback [59.287761696290865]
We propose a computationally efficient algorithm that achieves a sublinear regret guarantee for contextual episodic Markov Decision Processes (MDPs) with personalized feedback.<n>We demonstrate the effectiveness of our method in learning personalized objectives from multi-turn interactions through experiments on both a synthetic episodic MDP and a real-world user booking dataset.
arXiv Detail & Related papers (2026-02-09T06:29:54Z)
Landscape-aware Automated Algorithm Design: An Efficient Framework for Real-world Optimization [32.203665659052845]
Large Language Models (LLMs) have opened new frontiers in automated algorithm design.<n>LLMs require extensive evaluation of the target problem to guide the search process.<n>This research proposes an innovative and efficient framework that decouples algorithm discovery from high-cost evaluation.
arXiv Detail & Related papers (2026-02-04T13:18:45Z)
Rethinking Testing for LLM Applications: Characteristics, Challenges, and a Lightweight Interaction Protocol [83.83217247686402]
Large Language Models (LLMs) have evolved from simple text generators into complex software systems that integrate retrieval augmentation, tool invocation, and multi-turn interactions.<n>Their inherent non-determinism, dynamism, and context dependence pose fundamental challenges for quality assurance.<n>This paper decomposes LLM applications into a three-layer architecture: textbftextitSystem Shell Layer, textbftextitPrompt Orchestration Layer, and textbftextitLLM Inference Core.
arXiv Detail & Related papers (2025-08-28T13:00:28Z)
KompeteAI: Accelerated Autonomous Multi-Agent System for End-to-End Pipeline Generation for Machine Learning Problems [36.17807193758863]
KompeteAI is a novel AutoML framework with dynamic solution space exploration.<n>We introduce KompeteAI, a novel AutoML framework with dynamic solution space exploration.<n>We propose Kompete-bench to address limitations in MLE-Bench, where KompeteAI also achieves state-of-the-art results.
arXiv Detail & Related papers (2025-08-13T20:29:56Z)
Latent Guided Sampling for Combinatorial Optimization [3.636090511738153]
Recent Combinatorial Optimization methods leverage deep learning to learn solution strategies, trained via Supervised or Reinforcement Learning (RL)<n>While promising, these approaches often rely on task-specific augmentations, perform poorly on out-of-distribution instances, and lack robust inference mechanisms.<n>In this work, we propose LGS-Net, a novel latent space model that conditions on efficient problem instances, and introduce an efficient Neural inference method, Latent Guided Sampling (LGS)
arXiv Detail & Related papers (2025-06-04T08:02:59Z)
EVOLvE: Evaluating and Optimizing LLMs For In-Context Exploration [76.66831821738927]
Large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty.<n>We measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications.<n>Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs.
arXiv Detail & Related papers (2024-10-08T17:54:03Z)
Advancing Code Coverage: Incorporating Program Analysis with Large Language Models [8.31978033489419]
We propose TELPA, a novel technique to generate tests that can reach hard-to-cover branches.<n>Our experimental results on 27 open-source Python projects demonstrate that TELPA significantly outperforms the state-of-the-art SBST and LLM-based techniques.
arXiv Detail & Related papers (2024-04-07T14:08:28Z)
Robust Analysis of Multi-Task Learning Efficiency: New Benchmarks on Light-Weighed Backbones and Effective Measurement of Multi-Task Learning Challenges by Feature Disentanglement [69.51496713076253]
In this paper, we focus on the aforementioned efficiency aspects of existing MTL methods. We first carry out large-scale experiments of the methods with smaller backbones and on a the MetaGraspNet dataset as a new test ground. We also propose Feature Disentanglement measure as a novel and efficient identifier of the challenges in MTL.
arXiv Detail & Related papers (2024-02-05T22:15:55Z)
REX: Rapid Exploration and eXploitation for AI Agents [103.68453326880456]
We propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX. REX introduces an additional layer of rewards and integrates concepts similar to Upper Confidence Bound (UCB) scores, leading to more robust and efficient AI agent performance.
arXiv Detail & Related papers (2023-07-18T04:26:33Z)
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX. textttMEX integrates estimation and planning components while balancing exploration exploitation automatically. It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z)
MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion Control in Real Networks [63.24965775030673]
We propose a novel Reinforcement Learning (RL) approach to design generic Congestion Control (CC) algorithms. Our solution, MARLIN, uses the Soft Actor-Critic algorithm to maximize both entropy and return. We trained MARLIN on a real network with varying background traffic patterns to overcome the sim-to-real mismatch.
arXiv Detail & Related papers (2023-02-02T18:27:20Z)
Sample-Efficient, Exploration-Based Policy Optimisation for Routing Problems [2.6782615615913348]
This paper presents a new reinforcement learning approach that is based on entropy. In addition, we design an off-policy-based reinforcement learning technique that maximises the expected return. We show that our model can generalise to various route problems.
arXiv Detail & Related papers (2022-05-31T09:51:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.