Related papers: AgenticSimLaw: A Juvenile Courtroom Multi-Agent Debate Simulation for Explainable High-Stakes Tabular Decision Making

AgenticSimLaw: A Juvenile Courtroom Multi-Agent Debate Simulation for Explainable High-Stakes Tabular Decision Making

URL: http://arxiv.org/abs/2601.21936v1
Date: Thu, 29 Jan 2026 16:26:10 GMT
Title: AgenticSimLaw: A Juvenile Courtroom Multi-Agent Debate Simulation for Explainable High-Stakes Tabular Decision Making
Authors: Jon Chun, Kathrine Elkins, Yong Suk Lee,
Abstract summary: We introduce AgenticSimLaw, a role-structured, multi-agent debate framework that provides transparent and controllable testtime reasoning.<n>Unlike black-box approaches, our courtroom-style orchestration explicitly defines agent roles.<n>We benchmark this framework on young adult recidivism prediction using the NLSY97 dataset.
Score: 0.6218206949753592
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce AgenticSimLaw, a role-structured, multi-agent debate framework that provides transparent and controllable test-time reasoning for high-stakes tabular decision-making tasks. Unlike black-box approaches, our courtroom-style orchestration explicitly defines agent roles (prosecutor, defense, judge), interaction protocols (7-turn structured debate), and private reasoning strategies, creating a fully auditable decision-making process. We benchmark this framework on young adult recidivism prediction using the NLSY97 dataset, comparing it against traditional chain-of-thought (CoT) prompting across almost 90 unique combinations of models and strategies. Our results demonstrate that structured multi-agent debate provides more stable and generalizable performance compared to single-agent reasoning, with stronger correlation between accuracy and F1-score metrics. Beyond performance improvements, AgenticSimLaw offers fine-grained control over reasoning steps, generates complete interaction transcripts for explainability, and enables systematic profiling of agent behaviors. While we instantiate this framework in the criminal justice domain to stress-test reasoning under ethical complexity, the approach generalizes to any deliberative, high-stakes decision task requiring transparency and human oversight. This work addresses key LLM-based multi-agent system challenges: organization through structured roles, observability through logged interactions, and responsibility through explicit non-deployment constraints for sensitive domains. Data, results, and code will be available on github.com under the MIT license.

Related papers

The Why Behind the Action: Unveiling Internal Drivers via Agentic Attribution [63.61358761489141]
Large Language Model (LLM)-based agents are widely used in real-world applications such as customer service, web navigation, and software engineering.<n>We propose a novel framework for textbfgeneral agentic attribution, designed to identify the internal factors driving agent actions regardless of the task outcome.<n>We validate our framework across a diverse suite of agentic scenarios, including standard tool use and subtle reliability risks like memory-induced bias.
arXiv Detail & Related papers (2026-01-21T15:22:21Z)
Gaming the Judge: Unfaithful Chain-of-Thought Can Undermine Agent Evaluation [76.5533899503582]
Large language models (LLMs) are increasingly used as judges to evaluate agent performance.<n>We show this paradigm implicitly assumes that the agent's chain-of-thought (CoT) reasoning faithfully reflects both its internal reasoning and the underlying environment state.<n>We demonstrate that manipulated reasoning alone can inflate false positive rates of state-of-the-art VLM judges by up to 90% across 800 trajectories spanning diverse web tasks.
arXiv Detail & Related papers (2026-01-21T06:07:43Z)
Interact-RAG: Reason and Interact with the Corpus, Beyond Black-Box Retrieval [49.85856484781787]
We introduce Interact-RAG, a new paradigm that elevates the LLM agent into an active manipulator of the retrieval process.<n>We develop a reasoning-enhanced workflow, which enables both zero-shot execution and the synthesis of interaction trajectories.<n>Experiments across six benchmarks demonstrate that Interact-RAG significantly outperforms other advanced methods.
arXiv Detail & Related papers (2025-10-31T15:48:43Z)
Unleashing Diverse Thinking Modes in LLMs through Multi-Agent Collaboration [5.19759149737193]
This paper introduces the Multi-Agent Collaboration Framework for Diverse Thinking Modes (DiMo)<n>It enhances both performance and interpretability by simulating a structured debate among four specialized Large Language Models (LLMs)<n>Across six benchmarks and under a unified open-source setup, DiMo improves accuracy over widely used single-model and debate baselines, with the largest gains on math.
arXiv Detail & Related papers (2025-10-18T21:22:36Z)
Benefits and Limitations of Communication in Multi-Agent Reasoning [11.788489289062312]
We propose a theoretical framework to analyze the expressivity of multi-agent systems.<n>We derive bounds on (i) the number of agents required to solve the task exactly, (ii) the quantity and structure of inter-agent communication, and (iii) the achievable speedups as problem size and context scale.<n>Our results identify regimes where communication is provably beneficial, delineate tradeoffs between agent count and bandwidth, and expose intrinsic limitations when either resource is constrained.
arXiv Detail & Related papers (2025-10-14T20:04:27Z)
Reasoning-Aware Prompt Orchestration: A Foundation Model for Multi-Agent Language Model Coordination [0.0]
We present a theoretically-grounded framework for dynamic prompt orchestration that enhances reasoning across multiple specialized agents.<n>This framework addresses three core challenges: logical consistency preservation during agent transitions, reasoning-aware prompt adaptation, and scalable coordination of distributed inference.<n> Experimental results on 1,000 synthetic multi-agent conversations demonstrate a 42% reduction in reasoning latency, a 23% improvement in logical consistency measured by ROUGE-L score, and an 89% success rate for task completion without context loss.
arXiv Detail & Related papers (2025-09-30T22:33:01Z)
AgentCDM: Enhancing Multi-Agent Collaborative Decision-Making via ACH-Inspired Structured Reasoning [8.566904810788213]
AgentCDM is a structured framework for enhancing collaborative decision-making in multi-agent systems.<n>It internalizes cognitive biases and shifts decision-making from passive answer selection to active hypothesis evaluation and construction.<n>Experiments on multiple benchmark datasets demonstrate that AgentCDM achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-08-16T09:46:04Z)
DICE: Dynamic In-Context Example Selection in LLM Agents via Efficient Knowledge Transfer [50.64531021352504]
Large language model-based agents, empowered by in-context learning (ICL), have demonstrated strong capabilities in complex reasoning and tool-use tasks.<n>Existing approaches typically rely on example selection, including in some agentic or multi-step settings.<n>We propose DICE, a theoretically grounded ICL framework for agentic tasks that selects the most relevant demonstrations at each step of reasoning.
arXiv Detail & Related papers (2025-07-31T13:42:14Z)
Textualized Agent-Style Reasoning for Complex Tasks by Multiple Round LLM Generation [49.27250832754313]
We present AgentCOT, a llm-based autonomous agent framework. At each step, AgentCOT selects an action and executes it to yield an intermediate result with supporting evidence. We introduce two new strategies to enhance the performance of AgentCOT.
arXiv Detail & Related papers (2024-09-19T02:20:06Z)
On the Complexity of Multi-Agent Decision Making: From Learning in Games to Partial Monitoring [105.13668993076801]
A central problem in the theory of multi-agent reinforcement learning (MARL) is to understand what structural conditions and algorithmic principles lead to sample-efficient learning guarantees. We study this question in a general framework for interactive decision making with multiple agents. We show that characterizing the statistical complexity for multi-agent decision making is equivalent to characterizing the statistical complexity of single-agent decision making.
arXiv Detail & Related papers (2023-05-01T06:46:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.