Related papers: Cascaded Language Models for Cost-effective Human-AI Decision-Making

Cascaded Language Models for Cost-effective Human-AI Decision-Making

URL: http://arxiv.org/abs/2506.11887v3
Date: Fri, 24 Oct 2025 14:06:15 GMT
Title: Cascaded Language Models for Cost-effective Human-AI Decision-Making
Authors: Claudio Fanconi, Mihaela van der Schaar,
Abstract summary: We present a cascaded LLM decision framework that adaptively delegates tasks across multiple tiers of expertise.<n>First, a deferral policy determines whether to accept the base model's answer or regenerate it with a large model.<n>Second, an abstention policy decides whether the cascade model response is sufficiently certain or requires human intervention.
Score: 52.81324217423194
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A challenge in human-AI decision-making is to balance three factors: the correctness of predictions, the cost of knowledge and reasoning complexity, and the confidence about whether to abstain from automated answers or escalate to human experts. In this work, we present a cascaded LLM decision framework that adaptively delegates tasks across multiple tiers of expertise -- a base model for initial candidate answers, a more capable and knowledgeable (but costlier) large model, and a human expert for when the model cascade abstains. Our method proceeds in two stages. First, a deferral policy determines whether to accept the base model's answer or regenerate it with the large model based on the confidence score. Second, an abstention policy decides whether the cascade model response is sufficiently certain or requires human intervention. Moreover, to overcome static policies and accommodate changing task difficulty, we incorporate an online learning mechanism which uses human feedback. We demonstrate this approach to general question-answering (ARC-Easy, ARC-Challenge, and MMLU) and medical question-answering (MedQA and MedMCQA). Our results demonstrate that our cascaded strategy outperforms single-model baselines in most cases, achieving higher accuracy while reducing costs and providing a principled approach to handling abstentions.

Related papers

Person-AI Bidirectional Fit - A Proof-Of-Concept Case Study Of Augmented Human-Ai Symbiosis In Management Decision-Making Process [39.146761527401424]
This article develops the concept of Person-AI bidirectional fit, defined as the continuously evolving, context-sensitive alignment-primarily cognitive, but also emotional and behavioral-between a human decision-maker and an artificial intelligence system.<n>The study examines the role of P-AI fit in managerial decision-making through a proof-of-concept case study involving a real hiring process for a Senior AI Lead.
arXiv Detail & Related papers (2025-11-17T18:22:30Z)
To Ask or Not to Ask: Learning to Require Human Feedback [16.806124909744877]
We propose a new framework that handles both when and how to incorporate expert input in an Machine Learning model.<n>LtA is based on a two-part architecture: a standard ML model and an enriched model trained with additional expert human feedback.<n>We provide two practical implementations of LtA: a sequential approach, which trains the models in stages, and a joint approach, which optimises them simultaneously.
arXiv Detail & Related papers (2025-10-09T15:00:06Z)
Cognitive Decision Routing in Large Language Models: When to Think Fast, When to Think Slow [0.0]
Large Language Models (LLMs) face a fundamental challenge in deciding when to rely on rapid, intuitive responses versus engaging in slower, more deliberate reasoning.<n>Inspired by Daniel Kahneman's dual-process theory and his insights on human cognitive biases, we propose a novel Cognitive Decision Routing framework.
arXiv Detail & Related papers (2025-08-17T01:07:58Z)
When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration [79.69935257008467]
We introduce Knowledge Integration and Transfer Evaluation (KITE), a conceptual and experimental framework for Human-AI knowledge transfer capabilities.<n>We conduct the first large-scale human study (N=118) explicitly designed to measure it.<n>In our two-phase setup, humans first ideate with an AI on problem-solving strategies, then independently implement solutions, isolating model explanations' influence on human understanding.
arXiv Detail & Related papers (2025-06-05T20:48:16Z)
Bounded-Abstention Pairwise Learning to Rank [21.876570823233656]
Abstention enables algorithmic decision-making system to defer uncertain or low-confidence decisions to human experts.<n>We introduce a novel method for abstention in pairwise learning-to-rank tasks.<n>Our contributions are threefold: a theoretical characterization of the optimal abstention strategy, a model-agnostic, plug-in algorithm for constructing abstaining ranking models, and a comprehensive empirical evaluations across multiple datasets.
arXiv Detail & Related papers (2025-05-29T13:35:39Z)
DeCoDe: Defer-and-Complement Decision-Making via Decoupled Concept Bottleneck Models [37.118479480792416]
We propose a concept-driven framework for human-AI collaboration.<n>DeCoDe makes strategy decisions based on human-interpretable concept representations.<n>It supports three modes: autonomous AI prediction, deferral to humans, and human-AI collaborative complementarity.
arXiv Detail & Related papers (2025-05-25T16:34:45Z)
Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique [66.94905631175209]
We propose a novel inference-time scaling approach -- stepwise natural language self-critique (PANEL)<n>It employs self-generated natural language critiques as feedback to guide the step-level search process.<n>This approach bypasses the need for task-specific verifiers and the associated training overhead.
arXiv Detail & Related papers (2025-03-21T17:59:55Z)
DAFE: LLM-Based Evaluation Through Dynamic Arbitration for Free-Form Question-Answering [12.879551933541345]
We propose the Dynamic Arbitration Framework for Evaluation (DAFE) to evaluate large language models.<n>DAFE employs two primary LLM-as-judges and engages a third arbitrator only in cases of disagreements.<n>We show DAFE's ability to provide consistent, scalable, and resource-efficient assessments.
arXiv Detail & Related papers (2025-03-11T15:29:55Z)
The Superalignment of Superhuman Intelligence with Large Language Models [63.96120398355404]
We discuss the concept of superalignment from the learning perspective to answer this question.<n>We highlight some key research problems in superalignment, namely, weak-to-strong generalization, scalable oversight, and evaluation.<n>We present a conceptual framework for superalignment, which consists of three modules: an attacker which generates adversary queries trying to expose the weaknesses of a learner model; a learner which will refine itself by learning from scalable feedbacks generated by a critic model along with minimal human experts; and a critic which generates critics or explanations for a given query-response pair, with a target of improving the learner by criticizing.
arXiv Detail & Related papers (2024-12-15T10:34:06Z)
Towards Objective and Unbiased Decision Assessments with LLM-Enhanced Hierarchical Attention Networks [6.520709313101523]
This work investigates cognitive bias identification in high-stake decision making process by human experts. We propose bias-aware AI-augmented workflow that surpass human judgment. In our experiments, both the proposed model and the agentic workflow significantly improves on both human judgment and alternative models.
arXiv Detail & Related papers (2024-11-13T10:42:11Z)
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.<n>We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.<n>Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z)
Learning To Guide Human Decision Makers With Vision-Language Models [17.957952996809716]
There is increasing interest in developing AIs for assisting human decision-making in high-stakes tasks, such as medical diagnosis.<n>We introduce learning to guide (LTG), an alternative framework in which - rather than taking control from the human expert - the machine provides guidance.<n>In order to ensure guidance is interpretable, we develop SLOG, an approach for turning any vision-language model into a capable generator of textual guidance.
arXiv Detail & Related papers (2024-03-25T07:34:42Z)
Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations [13.60538902487872]
We present a novel post-hoc concept-based XAI framework that conveys besides instance-wise (local) also class-wise (global) decision-making strategies via prototypes. We demonstrate the effectiveness of our approach in identifying out-of-distribution samples, spurious model behavior and data quality issues across three datasets.
arXiv Detail & Related papers (2023-11-28T10:53:26Z)
Explaining by Imitating: Understanding Decisions by Interpretable Policy Learning [72.80902932543474]
Understanding human behavior from observed data is critical for transparency and accountability in decision-making. Consider real-world settings such as healthcare, in which modeling a decision-maker's policy is challenging. We propose a data-driven representation of decision-making behavior that inheres transparency by design, accommodates partial observability, and operates completely offline.
arXiv Detail & Related papers (2023-10-28T13:06:14Z)
SALMON: Self-Alignment with Instructable Reward Models [80.83323636730341]
This paper presents a novel approach, namely SALMON, to align base language models with minimal human supervision. We develop an AI assistant named Dromedary-2 with only 6 exemplars for in-context learning and 31 human-defined principles.
arXiv Detail & Related papers (2023-10-09T17:56:53Z)
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism [91.52263068880484]
We study offline Reinforcement Learning with Human Feedback (RLHF) We aim to learn the human's underlying reward and the MDP's optimal policy from a set of trajectories induced by human choices. RLHF is challenging for multiple reasons: large state space but limited human feedback, the bounded rationality of human decisions, and the off-policy distribution shift.
arXiv Detail & Related papers (2023-05-29T01:18:39Z)
Investigations of Performance and Bias in Human-AI Teamwork in Hiring [30.046502708053097]
In AI-assisted decision-making, effective hybrid teamwork (human-AI) is not solely dependent on AI performance alone. We investigate how both a model's predictive performance and bias may transfer to humans in a recommendation-aided decision task.
arXiv Detail & Related papers (2022-02-21T17:58:07Z)
Leveraging Expert Consistency to Improve Algorithmic Decision Support [62.61153549123407]
We explore the use of historical expert decisions as a rich source of information that can be combined with observed outcomes to narrow the construct gap. We propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert. Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap.
arXiv Detail & Related papers (2021-01-24T05:40:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.