Beyond Prompt Engineering: Neuro-Symbolic-Causal Architecture for Robust Multi-Objective AI Agents
- URL: http://arxiv.org/abs/2510.23682v1
- Date: Mon, 27 Oct 2025 15:25:35 GMT
- Title: Beyond Prompt Engineering: Neuro-Symbolic-Causal Architecture for Robust Multi-Objective AI Agents
- Authors: Gokturk Aytug Akarlar,
- Abstract summary: Large language models show promise as autonomous decision-making agents, yet their deployment in high-stakes domains remains fraught with risk.<n>We present Chimera, a neuro-symbolic-causal architecture that integrates an LLM strategist, a formally verified symbolic constraint engine, and a causal inference module for counterfactual reasoning.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models show promise as autonomous decision-making agents, yet their deployment in high-stakes domains remains fraught with risk. Without architectural safeguards, LLM agents exhibit catastrophic brittleness: identical capabilities produce wildly different outcomes depending solely on prompt framing. We present Chimera, a neuro-symbolic-causal architecture that integrates three complementary components - an LLM strategist, a formally verified symbolic constraint engine, and a causal inference module for counterfactual reasoning. We benchmark Chimera against baseline architectures (LLM-only, LLM with symbolic constraints) across 52-week simulations in a realistic e-commerce environment featuring price elasticity, trust dynamics, and seasonal demand. Under organizational biases toward either volume or margin optimization, LLM-only agents fail catastrophically (total loss of \$99K in volume scenarios) or destroy brand trust (-48.6% in margin scenarios). Adding symbolic constraints prevents disasters but achieves only 43-87% of Chimera's profit. Chimera consistently delivers the highest returns (\$1.52M and \$1.96M respectively, some cases +\$2.2M) while improving brand trust (+1.8% and +10.8%, some cases +20.86%), demonstrating prompt-agnostic robustness. Our TLA+ formal verification proves zero constraint violations across all scenarios. These results establish that architectural design not prompt engineering determines the reliability of autonomous agents in production environments. We provide open-source implementations and interactive demonstrations for reproducibility.
Related papers
- The Six Sigma Agent: Achieving Enterprise-Grade Reliability in LLM Systems Through Consensus-Driven Decomposed Execution [0.0]
We introduce the Six Sigma Agent, a novel architecture that achieves enterprise-grade reliability through three synergistic components.<n>We demonstrate a 14,700x reliability improvement over single-agent execution while reducing costs by 80%.<n>Our work establishes that reliability in AI systems emerges from principled redundancy and consensus rather than model scaling alone.
arXiv Detail & Related papers (2026-01-29T20:04:29Z) - ReliabilityBench: Evaluating LLM Agent Reliability Under Production-Like Stress Conditions [0.32928123659012326]
Existing benchmarks for tool-using LLM agents primarily report single-run success rates and miss reliability properties required in production.<n>We introduce textbfReliabilityBench, a benchmark for evaluating agent reliability across three dimensions.<n>We evaluate two models (Gemini 2.0 Flash, GPT-4o) and two agent architectures (ReAct, Reflexion) across four domains (scheduling, travel, customer support, e-commerce) over 1,280 episodes.
arXiv Detail & Related papers (2026-01-03T13:41:33Z) - Optimistic TEE-Rollups: A Hybrid Architecture for Scalable and Verifiable Generative AI Inference on Blockchain [4.254924788681319]
We introduce Optimistic TEE-Rollups (OTR), a hybrid verification protocol that harmonizes constraints.<n>OTR achieves 99% of the throughput of centralized baselines with a marginal cost overhead of $0.07 per query.
arXiv Detail & Related papers (2025-12-23T09:16:41Z) - FLAMES: Fine-tuning LLMs to Synthesize Invariants for Smart Contract Security [41.836337574143535]
FLAMES is an automated approach that synthesizes runtime guards as Solidity "require" statements to harden smart contracts against exploits.<n>FLAMES employs domain-adapted large language models trained through fill-in-the-middle supervised fine-tuning on real-world invariants extracted from 514,506 verified contracts.
arXiv Detail & Related papers (2025-10-24T12:44:08Z) - ParaVul: A Parallel Large Language Model and Retrieval-Augmented Framework for Smart Contract Vulnerability Detection [43.41293570032631]
ParaVul is a retrieval-augmented framework to improve the reliability and accuracy of smart contract vulnerability detection.<n>We develop Sparse Low-Rank Adaptation (SLoRA) for LLM fine-tuning.<n>We construct a vulnerability contract dataset and develop a hybrid Retrieval-Augmented Generation (RAG) system.
arXiv Detail & Related papers (2025-10-20T03:23:41Z) - SaFeR-VLM: Toward Safety-aware Fine-grained Reasoning in Multimodal Models [66.71948519280669]
Multimodal Large Reasoning Models (MLRMs) demonstrate impressive crossmodal reasoning but often amplify safety risks under adversarial prompts.<n> Existing defenses mainly act at the output level and do not constrain the reasoning process, leaving models to implicit risks.<n>We propose SaFeR-VLM, which integrates four components and supports dynamic and interpretable safety decisions beyond surface-level filtering.
arXiv Detail & Related papers (2025-10-08T10:39:12Z) - Trade in Minutes! Rationality-Driven Agentic System for Quantitative Financial Trading [57.28635022507172]
TiMi is a rationality-driven multi-agent system that architecturally decouples strategy development from minute-level deployment.<n>We propose a two-tier analytical paradigm from macro patterns to micro customization, layered programming design for trading bot implementation, and closed-loop optimization driven by mathematical reflection.
arXiv Detail & Related papers (2025-10-06T13:08:55Z) - Towards Secure and Explainable Smart Contract Generation with Security-Aware Group Relative Policy Optimization [18.013438474903314]
We propose SmartCoder-R1, a framework for secure and explainable smart contract generation.<n>We train the model to emulate human security analysis.<n>SmartCoder-R1 establishes a new state of the art, achieving top performance across five key metrics.
arXiv Detail & Related papers (2025-09-12T03:14:50Z) - Chimera: Harnessing Multi-Agent LLMs for Automatic Insider Threat Simulation [17.496651394447596]
We propose Chimera, the first large language model (LLM)-based multi-agent framework that automatically simulates both benign and malicious insider activities.<n>Chimera models each employee with agents that have role-specific behavior and integrates modules for group meetings, pairwise interactions, and autonomous scheduling.<n>It incorporates 15 types of insider attacks (e.g., IP theft, system sabotage) and has been deployed to simulate activities in three sensitive domains.<n>We assess ChimeraLog via human studies and quantitative analysis, confirming its diversity, realism, and presence of explainable threat patterns.
arXiv Detail & Related papers (2025-08-11T08:24:48Z) - OmniEAR: Benchmarking Agent Reasoning in Embodied Tasks [52.87238755666243]
We present OmniEAR, a framework for evaluating how language models reason about physical interactions, tool usage, and multi-agent coordination in embodied tasks.<n>We model continuous physical properties and complex spatial relationships across 1,500 scenarios spanning household and industrial domains.<n>Our systematic evaluation reveals severe performance degradation when models must reason from constraints.
arXiv Detail & Related papers (2025-08-07T17:54:15Z) - AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security [74.22452069013289]
AegisLLM is a cooperative multi-agent defense against adversarial attacks and information leakage.<n>We show that scaling agentic reasoning system at test-time substantially enhances robustness without compromising model utility.<n> Comprehensive evaluations across key threat scenarios, including unlearning and jailbreaking, demonstrate the effectiveness of AegisLLM.
arXiv Detail & Related papers (2025-04-29T17:36:05Z) - EmoDebt: Bayesian-Optimized Emotional Intelligence for Strategic Agent-to-Agent Debt Recovery [65.30120701878582]
Large Language Model (LLM) agents are vulnerable to exploitation in emotion-sensitive domains like debt collection.<n>EmoDebt is an emotional intelligence engine that reframes a model's ability to express emotion in negotiation as a sequential decision-making problem.<n>Experiments on our proposed benchmark demonstrate that EmoDebt achieves significant strategic robustness, substantially outperforming non-adaptive and emotion-agnostic baselines.
arXiv Detail & Related papers (2025-03-27T01:41:34Z) - LLM2: Let Large Language Models Harness System 2 Reasoning [65.89293674479907]
Large language models (LLMs) have exhibited impressive capabilities across a myriad of tasks, yet they occasionally yield undesirable outputs.<n>We introduce LLM2, a novel framework that combines an LLM with a process-based verifier.<n>LLMs2 is responsible for generating plausible candidates, while the verifier provides timely process-based feedback to distinguish desirable and undesirable outputs.
arXiv Detail & Related papers (2024-12-29T06:32:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.