Human-AI Collaborative Uncertainty Quantification
- URL: http://arxiv.org/abs/2510.23476v1
- Date: Mon, 27 Oct 2025 16:11:23 GMT
- Title: Human-AI Collaborative Uncertainty Quantification
- Authors: Sima Noorani, Shayan Kiyani, George Pappas, Hamed Hassani,
- Abstract summary: We introduce Human AI Collaborative Uncertainty Quantification, a framework that formalizes how an AI model can refine a human expert's proposed prediction set.<n>We show that the optimal collaborative prediction set follows an intuitive two threshold structure over a single score function, extending a classical result in conformal prediction.<n>Experiments across image classification, regression, and text based medical decision making show that collaborative prediction sets consistently outperform either agent alone.
- Score: 26.38833436936642
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: AI predictive systems are increasingly embedded in decision making pipelines, shaping high stakes choices once made solely by humans. Yet robust decisions under uncertainty still rely on capabilities that current AI lacks: domain knowledge not captured by data, long horizon context, and reasoning grounded in the physical world. This gap has motivated growing efforts to design collaborative frameworks that combine the complementary strengths of humans and AI. This work advances this vision by identifying the fundamental principles of Human AI collaboration within uncertainty quantification, a key component of reliable decision making. We introduce Human AI Collaborative Uncertainty Quantification, a framework that formalizes how an AI model can refine a human expert's proposed prediction set with two goals: avoiding counterfactual harm, ensuring the AI does not degrade correct human judgments, and complementarity, enabling recovery of correct outcomes the human missed. At the population level, we show that the optimal collaborative prediction set follows an intuitive two threshold structure over a single score function, extending a classical result in conformal prediction. Building on this insight, we develop practical offline and online calibration algorithms with provable distribution free finite sample guarantees. The online method adapts to distribution shifts, including human behavior evolving through interaction with AI, a phenomenon we call Human to AI Adaptation. Experiments across image classification, regression, and text based medical decision making show that collaborative prediction sets consistently outperform either agent alone, achieving higher coverage and smaller set sizes across various conditions.
Related papers
- Align When They Want, Complement When They Need! Human-Centered Ensembles for Adaptive Human-AI Collaboration [13.041288521972563]
In human-AI decision making, designing AI that complements human expertise has been a natural strategy to enhance human-AI collaboration.<n>An aligned AI fosters trust yet risks reinforcing suboptimal human behavior and lowering human-AI team performance.<n>We introduce a novel human-centered adaptive AI ensemble that strategically toggles between two specialist AI models.
arXiv Detail & Related papers (2026-02-23T18:22:58Z) - Epistemology gives a Future to Complementarity in Human-AI Interactions [42.371764229953165]
complementarity is the claim that a human supported by an AI system can outperform either alone in a decision-making process.<n>We argue that historical instances of complementarity function as evidence that a given human-AI interaction is a reliable process.
arXiv Detail & Related papers (2026-01-14T21:04:28Z) - When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration [79.69935257008467]
We introduce Knowledge Integration and Transfer Evaluation (KITE), a conceptual and experimental framework for Human-AI knowledge transfer capabilities.<n>We conduct the first large-scale human study (N=118) explicitly designed to measure it.<n>In our two-phase setup, humans first ideate with an AI on problem-solving strategies, then independently implement solutions, isolating model explanations' influence on human understanding.
arXiv Detail & Related papers (2025-06-05T20:48:16Z) - DeCoDe: Defer-and-Complement Decision-Making via Decoupled Concept Bottleneck Models [37.118479480792416]
We propose a concept-driven framework for human-AI collaboration.<n>DeCoDe makes strategy decisions based on human-interpretable concept representations.<n>It supports three modes: autonomous AI prediction, deferral to humans, and human-AI collaborative complementarity.
arXiv Detail & Related papers (2025-05-25T16:34:45Z) - Human aversion? Do AI Agents Judge Identity More Harshly Than Performance [0.06554326244334868]
We investigate how AI agents based on large language models assess and integrate human input.<n>We find that the AI system systematically discounts human advice, penalizing human errors more severely than algorithmic errors.
arXiv Detail & Related papers (2025-03-31T02:05:27Z) - Position: Towards Bidirectional Human-AI Alignment [109.57781720848669]
We argue that the research community should explicitly define and critically reflect on "alignment" to account for the bidirectional and dynamic relationship between humans and AI.<n>We introduce the Bidirectional Human-AI Alignment framework, which not only incorporates traditional efforts to align AI with human values but also introduces the critical, underexplored dimension of aligning humans with AI.
arXiv Detail & Related papers (2024-06-13T16:03:25Z) - A Decision Theoretic Framework for Measuring AI Reliance [20.669176502049066]
Humans frequently make decisions with the aid of artificially intelligent (AI) systems.<n>Researchers have identified ensuring that a human has appropriate reliance on an AI as a critical component of achieving complementary performance.<n>We propose a formal definition of reliance, based on statistical decision theory, which separates the concepts of reliance as the probability the decision-maker follows the AI's recommendation.
arXiv Detail & Related papers (2024-01-27T09:13:09Z) - Fairness in AI and Its Long-Term Implications on Society [68.8204255655161]
We take a closer look at AI fairness and analyze how lack of AI fairness can lead to deepening of biases over time.
We discuss how biased models can lead to more negative real-world outcomes for certain groups.
If the issues persist, they could be reinforced by interactions with other risks and have severe implications on society in the form of social unrest.
arXiv Detail & Related papers (2023-04-16T11:22:59Z) - Human Uncertainty in Concept-Based AI Systems [37.82747673914624]
We study human uncertainty in the context of concept-based AI systems.
We show that training with uncertain concept labels may help mitigate weaknesses in concept-based systems.
arXiv Detail & Related papers (2023-03-22T19:17:57Z) - Learning Complementary Policies for Human-AI Teams [22.13683008398939]
We propose a framework for a novel human-AI collaboration for selecting advantageous course of action.
Our solution aims to exploit the human-AI complementarity to maximize decision rewards.
arXiv Detail & Related papers (2023-02-06T17:22:18Z) - Human-Algorithm Collaboration: Achieving Complementarity and Avoiding
Unfairness [92.26039686430204]
We show that even in carefully-designed systems, complementary performance can be elusive.
First, we provide a theoretical framework for modeling simple human-algorithm systems.
Next, we use this model to prove conditions where complementarity is impossible.
arXiv Detail & Related papers (2022-02-17T18:44:41Z) - Deciding Fast and Slow: The Role of Cognitive Biases in AI-assisted
Decision-making [46.625616262738404]
We use knowledge from the field of cognitive science to account for cognitive biases in the human-AI collaborative decision-making setting.
We focus specifically on anchoring bias, a bias commonly encountered in human-AI collaboration.
arXiv Detail & Related papers (2020-10-15T22:25:41Z) - Effect of Confidence and Explanation on Accuracy and Trust Calibration
in AI-Assisted Decision Making [53.62514158534574]
We study whether features that reveal case-specific model information can calibrate trust and improve the joint performance of the human and AI.
We show that confidence score can help calibrate people's trust in an AI model, but trust calibration alone is not sufficient to improve AI-assisted decision making.
arXiv Detail & Related papers (2020-01-07T15:33:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.