Multi-Round Human-AI Collaboration with User-Specified Requirements
- URL: http://arxiv.org/abs/2602.17646v2
- Date: Tue, 24 Feb 2026 18:15:39 GMT
- Title: Multi-Round Human-AI Collaboration with User-Specified Requirements
- Authors: Sima Noorani, Shayan Kiyani, Hamed Hassani, George Pappas,
- Abstract summary: We adopt a human centric view governed by two principles: counterfactual harm and complementarity.<n>We formalize these concepts via user defined rules, allowing users to specify exactly what harm and complementarity mean.<n>We show that our online procedure maintains prescribed counterfactual harm and complementarity violation rates even under nonstationary interaction dynamics.
- Score: 26.38833436936642
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As humans increasingly rely on multiround conversational AI for high stakes decisions, principled frameworks are needed to ensure such interactions reliably improve decision quality. We adopt a human centric view governed by two principles: counterfactual harm, ensuring the AI does not undermine human strengths, and complementarity, ensuring it adds value where the human is prone to err. We formalize these concepts via user defined rules, allowing users to specify exactly what harm and complementarity mean for their specific task. We then introduce an online, distribution free algorithm with finite sample guarantees that enforces the user-specified constraints over the collaboration dynamics. We evaluate our framework across two interactive settings: LLM simulated collaboration on a medical diagnostic task and a human crowdsourcing study on a pictorial reasoning task. We show that our online procedure maintains prescribed counterfactual harm and complementarity violation rates even under nonstationary interaction dynamics. Moreover, tightening or loosening these constraints produces predictable shifts in downstream human accuracy, confirming that the two principles serve as practical levers for steering multi-round collaboration toward better decision quality without the need to model or constrain human behavior.
Related papers
- Human-AI Collaborative Uncertainty Quantification [26.38833436936642]
We introduce Human AI Collaborative Uncertainty Quantification, a framework that formalizes how an AI model can refine a human expert's proposed prediction set.<n>We show that the optimal collaborative prediction set follows an intuitive two threshold structure over a single score function, extending a classical result in conformal prediction.<n>Experiments across image classification, regression, and text based medical decision making show that collaborative prediction sets consistently outperform either agent alone.
arXiv Detail & Related papers (2025-10-27T16:11:23Z) - Cascaded Language Models for Cost-effective Human-AI Decision-Making [52.81324217423194]
We present a cascaded LLM decision framework that adaptively delegates tasks across multiple tiers of expertise.<n>First, a deferral policy determines whether to accept the base model's answer or regenerate it with a large model.<n>Second, an abstention policy decides whether the cascade model response is sufficiently certain or requires human intervention.
arXiv Detail & Related papers (2025-06-13T15:36:22Z) - Robot-Gated Interactive Imitation Learning with Adaptive Intervention Mechanism [48.41735416075536]
Interactive Imitation Learning (IIL) allows agents to acquire desired behaviors through human interventions.<n>We propose the Adaptive Intervention Mechanism (AIM), a novel robot-gated IIL algorithm that learns an adaptive criterion for requesting human demonstrations.
arXiv Detail & Related papers (2025-06-10T18:43:26Z) - DeCoDe: Defer-and-Complement Decision-Making via Decoupled Concept Bottleneck Models [37.118479480792416]
We propose a concept-driven framework for human-AI collaboration.<n>DeCoDe makes strategy decisions based on human-interpretable concept representations.<n>It supports three modes: autonomous AI prediction, deferral to humans, and human-AI collaborative complementarity.
arXiv Detail & Related papers (2025-05-25T16:34:45Z) - Human aversion? Do AI Agents Judge Identity More Harshly Than Performance [0.06554326244334868]
We investigate how AI agents based on large language models assess and integrate human input.<n>We find that the AI system systematically discounts human advice, penalizing human errors more severely than algorithmic errors.
arXiv Detail & Related papers (2025-03-31T02:05:27Z) - MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention [76.83428371942735]
We introduce MEReQ (Maximum-Entropy Residual-Q Inverse Reinforcement Learning), designed for sample-efficient alignment from human intervention.<n>MereQ infers a residual reward function that captures the discrepancy between the human expert's and the prior policy's underlying reward functions.<n>It then employs Residual Q-Learning (RQL) to align the policy with human preferences using this residual reward function.
arXiv Detail & Related papers (2024-06-24T01:51:09Z) - Position: Towards Bidirectional Human-AI Alignment [109.57781720848669]
We argue that the research community should explicitly define and critically reflect on "alignment" to account for the bidirectional and dynamic relationship between humans and AI.<n>We introduce the Bidirectional Human-AI Alignment framework, which not only incorporates traditional efforts to align AI with human values but also introduces the critical, underexplored dimension of aligning humans with AI.
arXiv Detail & Related papers (2024-06-13T16:03:25Z) - AI, Pluralism, and (Social) Compensation [1.5442389863546546]
A strategy in response to pluralistic values in a user population is to personalize an AI system.
If the AI can adapt to the specific values of each individual, then we can potentially avoid many of the challenges of pluralism.
However, if there is an external measure of success for the human-AI team, then the adaptive AI system may develop strategies to compensate for its human teammate.
arXiv Detail & Related papers (2024-04-30T04:41:47Z) - Online Decision Mediation [72.80902932543474]
Consider learning a decision support assistant to serve as an intermediary between (oracle) expert behavior and (imperfect) human behavior.
In clinical diagnosis, fully-autonomous machine behavior is often beyond ethical affordances.
arXiv Detail & Related papers (2023-10-28T05:59:43Z) - Learning Complementary Policies for Human-AI Teams [13.371050441794651]
This paper tackles the challenge of human-AI complementarity in decision-making.<n>We develop a robust solution for human-AI collaboration when outcomes are only observed under assigned actions.<n>We show that substantial performance improvements are achievable by routing only a small fraction of instances to human decision-makers.
arXiv Detail & Related papers (2023-02-06T17:22:18Z) - PECAN: Leveraging Policy Ensemble for Context-Aware Zero-Shot Human-AI
Coordination [52.991211077362586]
We propose a policy ensemble method to increase the diversity of partners in the population.
We then develop a context-aware method enabling the ego agent to analyze and identify the partner's potential policy primitives.
In this way, the ego agent is able to learn more universal cooperative behaviors for collaborating with diverse partners.
arXiv Detail & Related papers (2023-01-16T12:14:58Z) - Human-AI Collaboration in Decision-Making: Beyond Learning to Defer [4.874780144224057]
Human-AI collaboration (HAIC) in decision-making aims to create synergistic teaming between humans and AI systems.
Learning to Defer (L2D) has been presented as a promising framework to determine who among humans and AI should take which decisions.
L2D entails several often unfeasible requirements, such as availability of predictions from humans for every instance or ground-truth labels independent from said decision-makers.
arXiv Detail & Related papers (2022-06-27T11:40:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.