Towards a Cascaded LLM Framework for Cost-effective Human-AI Decision-Making
- URL: http://arxiv.org/abs/2506.11887v2
- Date: Mon, 16 Jun 2025 14:30:20 GMT
- Title: Towards a Cascaded LLM Framework for Cost-effective Human-AI Decision-Making
- Authors: Claudio Fanconi, Mihaela van der Schaar,
- Abstract summary: We present a cascaded LLM decision framework that adaptively delegates tasks across multiple tiers of expertise.<n>First, a deferral policy determines whether to accept the base model's answer or regenerate it with the large model.<n>Second, an abstention policy decides whether the cascade model response is sufficiently certain or requires human intervention.
- Score: 55.2480439325792
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Effective human-AI decision-making balances three key factors: the \textit{correctness} of predictions, the \textit{cost} of knowledge and reasoning complexity, and the confidence about whether to \textit{abstain} automated answers or involve human experts. In this work, we present a cascaded LLM decision framework that adaptively delegates tasks across multiple tiers of expertise -- a base model for initial candidate answers, a more capable and knowledgeable (but costlier) large model, and a human expert for when the model cascade abstains. Our method proceeds in two stages. First, a deferral policy determines whether to accept the base model's answer or regenerate it with the large model based on the confidence score. Second, an abstention policy decides whether the cascade model response is sufficiently certain or requires human intervention. Moreover, we incorporate an online learning mechanism in the framework that can leverage human feedback to improve decision quality over time. We demonstrate this approach to general question-answering (ARC-Easy and ARC-Challenge) and medical question-answering (MedQA and MedMCQA). Our results show that our cascaded strategy outperforms in most cases single-model baselines in accuracy while reducing cost and providing a principled way to handle abstentions.
Related papers
- Bounded-Abstention Pairwise Learning to Rank [21.876570823233656]
Abstention enables algorithmic decision-making system to defer uncertain or low-confidence decisions to human experts.<n>We introduce a novel method for abstention in pairwise learning-to-rank tasks.<n>Our contributions are threefold: a theoretical characterization of the optimal abstention strategy, a model-agnostic, plug-in algorithm for constructing abstaining ranking models, and a comprehensive empirical evaluations across multiple datasets.
arXiv Detail & Related papers (2025-05-29T13:35:39Z) - DeCoDe: Defer-and-Complement Decision-Making via Decoupled Concept Bottleneck Models [37.118479480792416]
We propose a concept-driven framework for human-AI collaboration.<n>DeCoDe makes strategy decisions based on human-interpretable concept representations.<n>It supports three modes: autonomous AI prediction, deferral to humans, and human-AI collaborative complementarity.
arXiv Detail & Related papers (2025-05-25T16:34:45Z) - Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique [66.94905631175209]
We propose a novel inference-time scaling approach -- stepwise natural language self-critique (PANEL)<n>It employs self-generated natural language critiques as feedback to guide the step-level search process.<n>This approach bypasses the need for task-specific verifiers and the associated training overhead.
arXiv Detail & Related papers (2025-03-21T17:59:55Z) - DAFE: LLM-Based Evaluation Through Dynamic Arbitration for Free-Form Question-Answering [12.879551933541345]
We propose the Dynamic Arbitration Framework for Evaluation (DAFE) to evaluate large language models.<n>DAFE employs two primary LLM-as-judges and engages a third arbitrator only in cases of disagreements.<n>We show DAFE's ability to provide consistent, scalable, and resource-efficient assessments.
arXiv Detail & Related papers (2025-03-11T15:29:55Z) - The Superalignment of Superhuman Intelligence with Large Language Models [63.96120398355404]
We discuss the concept of superalignment from the learning perspective to answer this question.<n>We highlight some key research problems in superalignment, namely, weak-to-strong generalization, scalable oversight, and evaluation.<n>We present a conceptual framework for superalignment, which consists of three modules: an attacker which generates adversary queries trying to expose the weaknesses of a learner model; a learner which will refine itself by learning from scalable feedbacks generated by a critic model along with minimal human experts; and a critic which generates critics or explanations for a given query-response pair, with a target of improving the learner by criticizing.
arXiv Detail & Related papers (2024-12-15T10:34:06Z) - Towards Objective and Unbiased Decision Assessments with LLM-Enhanced Hierarchical Attention Networks [6.520709313101523]
This work investigates cognitive bias identification in high-stake decision making process by human experts.
We propose bias-aware AI-augmented workflow that surpass human judgment.
In our experiments, both the proposed model and the agentic workflow significantly improves on both human judgment and alternative models.
arXiv Detail & Related papers (2024-11-13T10:42:11Z) - MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.<n>We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.<n>Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z) - Learning To Guide Human Decision Makers With Vision-Language Models [17.957952996809716]
There is increasing interest in developing AIs for assisting human decision-making in high-stakes tasks, such as medical diagnosis.<n>We introduce learning to guide (LTG), an alternative framework in which - rather than taking control from the human expert - the machine provides guidance.<n>In order to ensure guidance is interpretable, we develop SLOG, an approach for turning any vision-language model into a capable generator of textual guidance.
arXiv Detail & Related papers (2024-03-25T07:34:42Z) - Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations [13.60538902487872]
We present a novel post-hoc concept-based XAI framework that conveys besides instance-wise (local) also class-wise (global) decision-making strategies via prototypes.
We demonstrate the effectiveness of our approach in identifying out-of-distribution samples, spurious model behavior and data quality issues across three datasets.
arXiv Detail & Related papers (2023-11-28T10:53:26Z) - Reinforcement Learning with Human Feedback: Learning Dynamic Choices via
Pessimism [91.52263068880484]
We study offline Reinforcement Learning with Human Feedback (RLHF)
We aim to learn the human's underlying reward and the MDP's optimal policy from a set of trajectories induced by human choices.
RLHF is challenging for multiple reasons: large state space but limited human feedback, the bounded rationality of human decisions, and the off-policy distribution shift.
arXiv Detail & Related papers (2023-05-29T01:18:39Z) - Leveraging Expert Consistency to Improve Algorithmic Decision Support [62.61153549123407]
We explore the use of historical expert decisions as a rich source of information that can be combined with observed outcomes to narrow the construct gap.
We propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert.
Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap.
arXiv Detail & Related papers (2021-01-24T05:40:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.