MARBLE: A Multi-Agent Rule-Based LLM Reasoning Engine for Accident Severity Prediction
- URL: http://arxiv.org/abs/2507.04893v1
- Date: Mon, 07 Jul 2025 11:27:49 GMT
- Title: MARBLE: A Multi-Agent Rule-Based LLM Reasoning Engine for Accident Severity Prediction
- Authors: Kaleem Ullah Qasim, Jiashu Zhang,
- Abstract summary: Accident severity prediction plays a critical role in transportation safety systems.<n>Existing methods often rely on monolithic models or black box prompting.<n>We propose a multiagent rule based LLM engine that decomposes the severity prediction task across a team of specialized reasoning agents.
- Score: 1.3102025155414727
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accident severity prediction plays a critical role in transportation safety systems but is a persistently difficult task due to incomplete data, strong feature dependencies, and severe class imbalance in which rare but high-severity cases are underrepresented and hard to detect. Existing methods often rely on monolithic models or black box prompting, which struggle to scale in noisy, real-world settings and offer limited interpretability. To address these challenges, we propose MARBLE a multiagent rule based LLM engine that decomposes the severity prediction task across a team of specialized reasoning agents, including an interchangeable ML-backed agent. Each agent focuses on a semantic subset of features (e.g., spatial, environmental, temporal), enabling scoped reasoning and modular prompting without the risk of prompt saturation. Predictions are coordinated through either rule-based or LLM-guided consensus mechanisms that account for class rarity and confidence dynamics. The system retains structured traces of agent-level reasoning and coordination outcomes, supporting in-depth interpretability and post-hoc performance diagnostics. Across both UK and US datasets, MARBLE consistently outperforms traditional machine learning classifiers and state-of-the-art (SOTA) prompt-based reasoning methods including Chain-of-Thought (CoT), Least-to-Most (L2M), and Tree-of-Thought (ToT) achieving nearly 90% accuracy where others plateau below 48%. This performance redefines the practical ceiling for accident severity classification under real world noise and extreme class imbalance. Our results position MARBLE as a generalizable and interpretable framework for reasoning under uncertainty in safety-critical applications.
Related papers
- ASCoT: An Adaptive Self-Correction Chain-of-Thought Method for Late-Stage Fragility in LLMs [16.266957200961908]
Chain-of-Thought (CoT) prompting has significantly advanced the reasoning capabilities of Large Language Models (LLMs)<n>Errors introduced in the later stages of a CoT chain are significantly more likely to corrupt the final answer than identical errors made at the beginning.<n>We introduce the Adaptive Self-Correction Chain-of-Thought (ASCoT) method to address this specific vulnerability.
arXiv Detail & Related papers (2025-08-07T11:26:40Z) - NDCG-Consistent Softmax Approximation with Accelerated Convergence [67.10365329542365]
We propose novel loss formulations that align directly with ranking metrics.<n>We integrate the proposed RG losses with the highly efficient Alternating Least Squares (ALS) optimization method.<n> Empirical evaluations on real-world datasets demonstrate that our approach achieves comparable or superior ranking performance.
arXiv Detail & Related papers (2025-06-11T06:59:17Z) - Interpretable Risk Mitigation in LLM Agent Systems [0.0]
We explore agent behaviour in a toy, game-theoretic environment based on a variation of the Iterated Prisoner's Dilemma.<n>We introduce a strategy-modification method-independent of both the game and the prompt-by steering the residual stream with interpretable features extracted from a sparse autoencoder latent space.
arXiv Detail & Related papers (2025-05-15T19:22:11Z) - Towards Robust LLMs: an Adversarial Robustness Measurement Framework [0.0]
Large Language Models (LLMs) remain vulnerable to adversarial perturbations, undermining their reliability in high-stakes applications.<n>We adapt the Robustness Measurement and Assessment framework to quantify LLM resilience against adversarial inputs without requiring access to model parameters.<n>Our work provides a systematic methodology to assess LLM robustness, advancing the development of more reliable language models for real-world deployment.
arXiv Detail & Related papers (2025-04-24T16:36:19Z) - Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge [0.0]
Large Language Models (LLMs) have revolutionized artificial intelligence, driving advancements in machine translation, summarization, and conversational agents.<n>Recent studies indicate that LLMs remain vulnerable to adversarial attacks designed to elicit biased responses.<n>This work proposes a scalable benchmarking framework to evaluate LLM robustness against adversarial bias elicitation.
arXiv Detail & Related papers (2025-04-10T16:00:59Z) - Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling [48.15636223774418]
Large language models (LLMs) are prone to hallucination stemming from misaligned self-awareness.<n>We propose the Explicit Knowledge Boundary Modeling framework to integrate fast and slow reasoning systems to harmonize reliability and usability.
arXiv Detail & Related papers (2025-03-04T03:16:02Z) - PredictaBoard: Benchmarking LLM Score Predictability [50.47497036981544]
Large Language Models (LLMs) often fail unpredictably.<n>This poses a significant challenge to ensuring their safe deployment.<n>We present PredictaBoard, a novel collaborative benchmarking framework.
arXiv Detail & Related papers (2025-02-20T10:52:38Z) - Breaking the Curse of Multiagency in Robust Multi-Agent Reinforcement Learning [37.80275600302316]
distributionally robust Markov games (RMGs) have been proposed to enhance robustness in MARL.<n>Two notorious and open challenges are the formulation of the uncertainty set and whether the corresponding RMGs can overcome the curse of multiagency.<n>In this work, we propose a natural class of RMGs inspired by behavioral economics, where each agent's uncertainty set is shaped by both the environment and the integrated behavior of other agents.
arXiv Detail & Related papers (2024-09-30T08:09:41Z) - Quantifying Prediction Consistency Under Fine-Tuning Multiplicity in Tabular LLMs [10.494477811252034]
Fine-tuning multiplicity can arise in Tabular LLMs on classification tasks.<n>Our work formalizes this unique challenge of fine-tuning multiplicity in Tabular LLMs.<n>We propose a novel measure to quantify consistency of individual predictions without expensive model retraining.
arXiv Detail & Related papers (2024-07-04T22:22:09Z) - MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.<n>We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.<n>Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - LoRA-Ensemble: Efficient Uncertainty Modelling for Self-Attention Networks [52.46420522934253]
We introduce LoRA-Ensemble, a parameter-efficient ensembling method for self-attention networks.<n>The method not only outperforms state-of-the-art implicit techniques like BatchEnsemble, but even matches or exceeds the accuracy of an Explicit Ensemble.
arXiv Detail & Related papers (2024-05-23T11:10:32Z) - Learning Prompt-Enhanced Context Features for Weakly-Supervised Video
Anomaly Detection [37.99031842449251]
Video anomaly detection under weak supervision presents significant challenges.
We present a weakly supervised anomaly detection framework that focuses on efficient context modeling and enhanced semantic discriminability.
Our approach significantly improves the detection accuracy of certain anomaly sub-classes, underscoring its practical value and efficacy.
arXiv Detail & Related papers (2023-06-26T06:45:16Z) - Uncertainty in Extreme Multi-label Classification [81.14232824864787]
eXtreme Multi-label Classification (XMC) is an essential task in the era of big data for web-scale machine learning applications.
In this paper, we aim to investigate general uncertainty quantification approaches for tree-based XMC models with a probabilistic ensemble-based framework.
In particular, we analyze label-level and instance-level uncertainty in XMC, and propose a general approximation framework based on beam search to efficiently estimate the uncertainty with a theoretical guarantee under long-tail XMC predictions.
arXiv Detail & Related papers (2022-10-18T20:54:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.