Related papers: TriCEGAR: A Trace-Driven Abstraction Mechanism for Agentic AI

TriCEGAR: A Trace-Driven Abstraction Mechanism for Agentic AI

URL: http://arxiv.org/abs/2601.22997v1
Date: Fri, 30 Jan 2026 14:01:47 GMT
Title: TriCEGAR: A Trace-Driven Abstraction Mechanism for Agentic AI
Authors: Roham Koohestani, Ateş Görpelioğlu, Egor Klimov, Burcu Kulahcioglu Ozkan, Maliheh Izadi,
Abstract summary: TriCEGAR is a trace-driven abstraction mechanism that automates state construction from execution logs.<n>We describe a framework-native implementation that captures typed agent lifecycle events and builds abstractions from traces.<n>We also show how run likelihoods enable anomaly detection as a guardrailing signal.
Score: 5.1181001367075
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Agentic AI systems act through tools and evolve their behavior over long, stochastic interaction traces. This setting complicates assurance, because behavior depends on nondeterministic environments and probabilistic model outputs. Prior work introduced runtime verification for agentic AI via Dynamic Probabilistic Assurance (DPA), learning an MDP online and model checking quantitative properties. A key limitation is that developers must manually define the state abstraction, which couples verification to application-specific heuristics and increases adoption friction. This paper proposes TriCEGAR, a trace-driven abstraction mechanism that automates state construction from execution logs and supports online construction of an agent behavioral MDP. TriCEGAR represents abstractions as predicate trees learned from traces and refined using counterexamples. We describe a framework-native implementation that (i) captures typed agent lifecycle events, (ii) builds abstractions from traces, (iii) constructs an MDP, and (iv) performs probabilistic model checking to compute bounds such as Pmax(success) and Pmin(failure). We also show how run likelihoods enable anomaly detection as a guardrailing signal.

Related papers

When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use Agents [90.05202259420138]
Unintended computer-use agents can deviate from expected outcomes even under benign input contexts.<n>We introduce the first conceptual and methodological framework for unintended CUA behaviors.<n>We propose AutoElicit: an agentic framework that iteratively perturbs benign instructions using CUA execution feedback.
arXiv Detail & Related papers (2026-02-09T03:20:11Z)
Guided Verifier: Collaborative Multimodal Reasoning via Dynamic Process Supervision [11.159231524113764]
Reinforcement Learning (RL) has emerged as a pivotal mechanism for enhancing the complex reasoning capabilities of Multimodal Large Language Models (MLLMs)<n>In this paper, we propose the textbfGuided Verifier framework to address these structural limitations.<n>We develop a specialized data synthesis pipeline targeting multimodal hallucinations, constructing textbfCoRe dataset of process-level negatives and textbfCorrect-guide textbfReasoning trajectories to train the guided verifier.
arXiv Detail & Related papers (2026-02-04T07:38:42Z)
CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents [60.98294016925157]
AI agents are vulnerable to prompt injection attacks, where malicious content hijacks agent behavior to steal credentials or cause financial loss.<n>We introduce Single-Shot Planning for CUAs, where a trusted planner generates a complete execution graph with conditional branches before any observation of potentially malicious content.<n>Although this architectural isolation successfully prevents instruction injections, we show that additional measures are needed to prevent Branch Steering attacks.
arXiv Detail & Related papers (2026-01-14T23:06:35Z)
Monadic Context Engineering [59.95390010097654]
This paper introduces Monadic Context Engineering (MCE) to provide a formal foundation for agent design.<n>We demonstrate how Monads enable robust composition, how Applicatives provide a principled structure for parallel execution, and crucially, how Monad Transformers allow for the systematic composition of these capabilities.<n>This layered approach enables developers to construct complex, resilient, and efficient AI agents from simple, independently verifiable components.
arXiv Detail & Related papers (2025-12-27T01:52:06Z)
Managing the Stochastic: Foundations of Learning in Neuro-Symbolic Systems for Software Engineering [0.27195102129094995]
Current approaches to AI coding agents blur the lines between the Large Language Model and the agent itself.<n>This paper proposes setting the control boundary such that the LLM is treated as a component of the environment environment.
arXiv Detail & Related papers (2025-12-18T15:28:21Z)
Automatic Building Code Review: A Case Study [6.530899637501737]
Building officials face labor-intensive, error-prone, and costly manual reviews of design documents as projects increase in size and complexity.<n>This study introduces a novel agent-driven framework that integrates BIM-based data extraction with automated verification.
arXiv Detail & Related papers (2025-10-03T00:30:14Z)
AgentGuard: Runtime Verification of AI Agents [1.14219428942199]
AgentGuard is a framework for runtime verification of Agentic AI systems.<n>It provides continuous, quantitative assurance through a new paradigm called Dynamic Probabilistic Assurance.
arXiv Detail & Related papers (2025-09-28T13:08:50Z)
Constrained Decoding for Robotics Foundation Models [12.916330118607918]
We introduce SafeDec, a constrained decoding framework for autoregressive robot foundation models.<n>Task-specific safety rules are expressed as Signal Temporal Logic (STL) formulas and are enforced at inference time with minimal overhead.<n>Our method ensures that generated actions provably satisfy STL specifications under assumed dynamics at runtime without retraining.
arXiv Detail & Related papers (2025-09-01T19:17:40Z)
Offline Action-Free Learning of Ex-BMDPs by Comparing Diverse Datasets [87.62730694973696]
This paper introduces CRAFT, a sample-efficient algorithm leveraging differences in controllable feature dynamics across agents to learn representations.<n>We provide theoretical guarantees for CRAFT's performance and demonstrate its feasibility on a toy example.
arXiv Detail & Related papers (2025-03-26T22:05:57Z)
Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference. Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable. We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z)
TapTree: Process-Tree Based Host Behavior Modeling and Threat Detection Framework via Sequential Pattern Mining [0.29465623430708915]
This paper presents TapTree, an automated process-tree based technique to extract host behavior by compiling system events' semantic information. In our evaluation against a recent benchmark audit log dataset (DARPA OpTC), TapTree employs tree pattern queries and sequential pattern mining techniques to deduce the semantics of connected system events.
arXiv Detail & Related papers (2023-12-10T15:12:55Z)
Explainability in Process Outcome Prediction: Guidelines to Obtain Interpretable and Faithful Models [77.34726150561087]
We define explainability through the interpretability of the explanations and the faithfulness of the explainability model in the field of process outcome prediction. This paper contributes a set of guidelines named X-MOP which allows selecting the appropriate model based on the event log specifications.
arXiv Detail & Related papers (2022-03-30T05:59:50Z)
Complex Event Forecasting with Prediction Suffix Trees: Extended Technical Report [70.7321040534471]
Complex Event Recognition (CER) systems have become popular in the past two decades due to their ability to "instantly" detect patterns on real-time streams of events. There is a lack of methods for forecasting when a pattern might occur before such an occurrence is actually detected by a CER engine. We present a formal framework that attempts to address the issue of Complex Event Forecasting.
arXiv Detail & Related papers (2021-09-01T09:52:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.