Related papers: Beyond Prompt Engineering: Neuro-Symbolic-Causal Architecture for Robust Multi-Objective AI Agents

Beyond Prompt Engineering: Neuro-Symbolic-Causal Architecture for Robust Multi-Objective AI Agents

URL: http://arxiv.org/abs/2510.23682v1
Date: Mon, 27 Oct 2025 15:25:35 GMT
Title: Beyond Prompt Engineering: Neuro-Symbolic-Causal Architecture for Robust Multi-Objective AI Agents
Authors: Gokturk Aytug Akarlar,
Abstract summary: Large language models show promise as autonomous decision-making agents, yet their deployment in high-stakes domains remains fraught with risk.<n>We present Chimera, a neuro-symbolic-causal architecture that integrates an LLM strategist, a formally verified symbolic constraint engine, and a causal inference module for counterfactual reasoning.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models show promise as autonomous decision-making agents, yet their deployment in high-stakes domains remains fraught with risk. Without architectural safeguards, LLM agents exhibit catastrophic brittleness: identical capabilities produce wildly different outcomes depending solely on prompt framing. We present Chimera, a neuro-symbolic-causal architecture that integrates three complementary components - an LLM strategist, a formally verified symbolic constraint engine, and a causal inference module for counterfactual reasoning. We benchmark Chimera against baseline architectures (LLM-only, LLM with symbolic constraints) across 52-week simulations in a realistic e-commerce environment featuring price elasticity, trust dynamics, and seasonal demand. Under organizational biases toward either volume or margin optimization, LLM-only agents fail catastrophically (total loss of \$99K in volume scenarios) or destroy brand trust (-48.6% in margin scenarios). Adding symbolic constraints prevents disasters but achieves only 43-87% of Chimera's profit. Chimera consistently delivers the highest returns (\$1.52M and \$1.96M respectively, some cases +\$2.2M) while improving brand trust (+1.8% and +10.8%, some cases +20.86%), demonstrating prompt-agnostic robustness. Our TLA+ formal verification proves zero constraint violations across all scenarios. These results establish that architectural design not prompt engineering determines the reliability of autonomous agents in production environments. We provide open-source implementations and interactive demonstrations for reproducibility.

Related papers

The Six Sigma Agent: Achieving Enterprise-Grade Reliability in LLM Systems Through Consensus-Driven Decomposed Execution [0.0]
We introduce the Six Sigma Agent, a novel architecture that achieves enterprise-grade reliability through three synergistic components.<n>We demonstrate a 14,700x reliability improvement over single-agent execution while reducing costs by 80%.<n>Our work establishes that reliability in AI systems emerges from principled redundancy and consensus rather than model scaling alone.
arXiv Detail & Related papers (2026-01-29T20:04:29Z)
ReliabilityBench: Evaluating LLM Agent Reliability Under Production-Like Stress Conditions [0.32928123659012326]
Existing benchmarks for tool-using LLM agents primarily report single-run success rates and miss reliability properties required in production.<n>We introduce textbfReliabilityBench, a benchmark for evaluating agent reliability across three dimensions.<n>We evaluate two models (Gemini 2.0 Flash, GPT-4o) and two agent architectures (ReAct, Reflexion) across four domains (scheduling, travel, customer support, e-commerce) over 1,280 episodes.
arXiv Detail & Related papers (2026-01-03T13:41:33Z)
Optimistic TEE-Rollups: A Hybrid Architecture for Scalable and Verifiable Generative AI Inference on Blockchain [4.254924788681319]
We introduce Optimistic TEE-Rollups (OTR), a hybrid verification protocol that harmonizes constraints.<n>OTR achieves 99% of the throughput of centralized baselines with a marginal cost overhead of $0.07 per query.
arXiv Detail & Related papers (2025-12-23T09:16:41Z)
FLAMES: Fine-tuning LLMs to Synthesize Invariants for Smart Contract Security [41.836337574143535]
FLAMES is an automated approach that synthesizes runtime guards as Solidity "require" statements to harden smart contracts against exploits.<n>FLAMES employs domain-adapted large language models trained through fill-in-the-middle supervised fine-tuning on real-world invariants extracted from 514,506 verified contracts.
arXiv Detail & Related papers (2025-10-24T12:44:08Z)
ParaVul: A Parallel Large Language Model and Retrieval-Augmented Framework for Smart Contract Vulnerability Detection [43.41293570032631]
ParaVul is a retrieval-augmented framework to improve the reliability and accuracy of smart contract vulnerability detection.<n>We develop Sparse Low-Rank Adaptation (SLoRA) for LLM fine-tuning.<n>We construct a vulnerability contract dataset and develop a hybrid Retrieval-Augmented Generation (RAG) system.
arXiv Detail & Related papers (2025-10-20T03:23:41Z)
SaFeR-VLM: Toward Safety-aware Fine-grained Reasoning in Multimodal Models [66.71948519280669]
Multimodal Large Reasoning Models (MLRMs) demonstrate impressive crossmodal reasoning but often amplify safety risks under adversarial prompts.<n> Existing defenses mainly act at the output level and do not constrain the reasoning process, leaving models to implicit risks.<n>We propose SaFeR-VLM, which integrates four components and supports dynamic and interpretable safety decisions beyond surface-level filtering.
arXiv Detail & Related papers (2025-10-08T10:39:12Z)
Trade in Minutes! Rationality-Driven Agentic System for Quantitative Financial Trading [57.28635022507172]
TiMi is a rationality-driven multi-agent system that architecturally decouples strategy development from minute-level deployment.<n>We propose a two-tier analytical paradigm from macro patterns to micro customization, layered programming design for trading bot implementation, and closed-loop optimization driven by mathematical reflection.
arXiv Detail & Related papers (2025-10-06T13:08:55Z)
Towards Secure and Explainable Smart Contract Generation with Security-Aware Group Relative Policy Optimization [18.013438474903314]
We propose SmartCoder-R1, a framework for secure and explainable smart contract generation.<n>We train the model to emulate human security analysis.<n>SmartCoder-R1 establishes a new state of the art, achieving top performance across five key metrics.
arXiv Detail & Related papers (2025-09-12T03:14:50Z)
Chimera: Harnessing Multi-Agent LLMs for Automatic Insider Threat Simulation [17.496651394447596]
We propose Chimera, the first large language model (LLM)-based multi-agent framework that automatically simulates both benign and malicious insider activities.<n>Chimera models each employee with agents that have role-specific behavior and integrates modules for group meetings, pairwise interactions, and autonomous scheduling.<n>It incorporates 15 types of insider attacks (e.g., IP theft, system sabotage) and has been deployed to simulate activities in three sensitive domains.<n>We assess ChimeraLog via human studies and quantitative analysis, confirming its diversity, realism, and presence of explainable threat patterns.
arXiv Detail & Related papers (2025-08-11T08:24:48Z)
OmniEAR: Benchmarking Agent Reasoning in Embodied Tasks [52.87238755666243]
We present OmniEAR, a framework for evaluating how language models reason about physical interactions, tool usage, and multi-agent coordination in embodied tasks.<n>We model continuous physical properties and complex spatial relationships across 1,500 scenarios spanning household and industrial domains.<n>Our systematic evaluation reveals severe performance degradation when models must reason from constraints.
arXiv Detail & Related papers (2025-08-07T17:54:15Z)
AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security [74.22452069013289]
AegisLLM is a cooperative multi-agent defense against adversarial attacks and information leakage.<n>We show that scaling agentic reasoning system at test-time substantially enhances robustness without compromising model utility.<n> Comprehensive evaluations across key threat scenarios, including unlearning and jailbreaking, demonstrate the effectiveness of AegisLLM.
arXiv Detail & Related papers (2025-04-29T17:36:05Z)
EmoDebt: Bayesian-Optimized Emotional Intelligence for Strategic Agent-to-Agent Debt Recovery [65.30120701878582]
Large Language Model (LLM) agents are vulnerable to exploitation in emotion-sensitive domains like debt collection.<n>EmoDebt is an emotional intelligence engine that reframes a model's ability to express emotion in negotiation as a sequential decision-making problem.<n>Experiments on our proposed benchmark demonstrate that EmoDebt achieves significant strategic robustness, substantially outperforming non-adaptive and emotion-agnostic baselines.
arXiv Detail & Related papers (2025-03-27T01:41:34Z)
LLM2: Let Large Language Models Harness System 2 Reasoning [65.89293674479907]
Large language models (LLMs) have exhibited impressive capabilities across a myriad of tasks, yet they occasionally yield undesirable outputs.<n>We introduce LLM2, a novel framework that combines an LLM with a process-based verifier.<n>LLMs2 is responsible for generating plausible candidates, while the verifier provides timely process-based feedback to distinguish desirable and undesirable outputs.
arXiv Detail & Related papers (2024-12-29T06:32:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.