Learning to Defer in Congested Systems: The AI-Human Interplay
- URL: http://arxiv.org/abs/2402.12237v4
- Date: Tue, 12 Aug 2025 21:03:31 GMT
- Title: Learning to Defer in Congested Systems: The AI-Human Interplay
- Authors: Thodoris Lykouris, Wentao Weng,
- Abstract summary: High-stakes applications rely on combining Artificial Intelligence (AI) and humans for responsive and reliable decision making.<n>In this paper, we introduce a model to capture such an AI-human interplay.<n>We propose a near-optimal learning algorithm that carefully balances the classification loss from a selectively sampled dataset.
- Score: 4.324474867341765
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-stakes applications rely on combining Artificial Intelligence (AI) and humans for responsive and reliable decision making. For example, content moderation in social media platforms often employs an AI-human pipeline to promptly remove policy violations without jeopardizing legitimate content. A typical heuristic estimates the risk of incoming content and uses fixed thresholds to decide whether to auto-delete the content (classification) and whether to send it for human review (admission). This approach can be inefficient as it disregards the uncertainty in AI's estimation, the time-varying element of content arrivals and human review capacity, and the selective sampling in the online dataset (humans only review content filtered by the AI). In this paper, we introduce a model to capture such an AI-human interplay. In this model, the AI observes contextual information for incoming jobs, makes classification and admission decisions, and schedules admitted jobs for human review. During these reviews, humans observe a job's true cost and may overturn an erroneous AI classification decision. These reviews also serve as new data to train the AI but are delayed due to congestion in the human review system. The objective is to minimize the costs of eventually misclassified jobs. We propose a near-optimal learning algorithm that carefully balances the classification loss from a selectively sampled dataset, the idiosyncratic loss of non-reviewed jobs, and the delay loss of having congestion in the human review system. To the best of our knowledge, this is the first result for online learning in contextual queueing systems. Moreover, numerical experiments based on online comment datasets show that our algorithm can substantially reduce the number of misclassifications compared to existing content moderation practice.
Related papers
- CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection [60.52240468810558]
We introduce CoCoNUTS, a content-oriented benchmark built upon a fine-grained dataset of AI-generated peer reviews.<n>We also develop CoCoDet, an AI review detector via a multi-task learning framework, to achieve more accurate and robust detection of AI involvement in review content.
arXiv Detail & Related papers (2025-08-28T06:03:11Z) - Modeling Human Responses to Multimodal AI Content [10.65875439980452]
MhAIM dataset contains 154,552 online posts (111,153 of them AI-generated)<n>Our human study reveals that people are better at identifying AI content when posts include both text and visuals.<n>We present T-Lens, an agent system designed to answer user queries by incorporating predicted human responses to multimodal information.
arXiv Detail & Related papers (2025-08-14T15:55:19Z) - AI vs. Human Judgment of Content Moderation: LLM-as-a-Judge and Ethics-Based Response Refusals [0.0]
This paper examines whether model-based evaluators assess refusal responses differently than human users.<n>We find that LLM-as-a-Judge systems evaluate ethical refusals significantly more favorably than human users.
arXiv Detail & Related papers (2025-05-21T10:56:16Z) - Human aversion? Do AI Agents Judge Identity More Harshly Than Performance [0.06554326244334868]
We investigate how AI agents based on large language models assess and integrate human input.
We find that the AI system systematically discounts human advice, penalizing human errors more severely than algorithmic errors.
arXiv Detail & Related papers (2025-03-31T02:05:27Z) - Almost AI, Almost Human: The Challenge of Detecting AI-Polished Writing [55.2480439325792]
This study systematically evaluations twelve state-of-the-art AI-text detectors using our AI-Polished-Text Evaluation dataset.<n>Our findings reveal that detectors frequently flag even minimally polished text as AI-generated, struggle to differentiate between degrees of AI involvement, and exhibit biases against older and smaller models.
arXiv Detail & Related papers (2025-02-21T18:45:37Z) - AI-Assisted Decision Making with Human Learning [8.598431584462944]
In many cases, despite the algorithm's superior performance, the final decision remains in human hands.
This paper studies such AI-assisted decision-making settings, where the human learns through repeated interactions with the algorithm.
We observe that the discrepancy between the algorithm's model and the human's model creates a fundamental tradeoff.
arXiv Detail & Related papers (2025-02-18T17:08:21Z) - What should an AI assessor optimise for? [57.96463917842822]
An AI assessor is an external, ideally indepen-dent system that predicts an indicator, e.g., a loss value, of another AI system.<n>Here we address the question: is it always optimal to train the assessor for the target metric?<n>We experimentally explore this question for, respectively, regression losses and classification scores with monotonic and non-monotonic mappings.
arXiv Detail & Related papers (2025-02-01T08:41:57Z) - Online Bandit Learning with Offline Preference Data [15.799929216215672]
We propose a posterior sampling algorithm for online learning that can be warm-started with an offline dataset with noisy preference feedback.
We show that by modeling the 'competence' of the expert that generated it, we are able to use such a dataset most effectively.
arXiv Detail & Related papers (2024-06-13T20:25:52Z) - Does AI help humans make better decisions? A statistical evaluation framework for experimental and observational studies [0.43981305860983716]
We show how to compare the performance of three alternative decision-making systems--human-alone, human-with-AI, and AI-alone.
We find that the risk assessment recommendations do not improve the classification accuracy of a judge's decision to impose cash bail.
arXiv Detail & Related papers (2024-03-18T01:04:52Z) - Leveraging AI Predicted and Expert Revised Annotations in Interactive
Segmentation: Continual Tuning or Full Training? [7.742968966681627]
Human experts revise the annotations predicted by AI, and in turn, AI improves its predictions by learning from these revised annotations.
The risk of catastrophic forgetting--the AI tends to forget the previously learned classes if it is only retrained using the expert revised classes.
This paper proposes Continual Tuning to address the problems from two perspectives: network design and data reuse.
arXiv Detail & Related papers (2024-02-29T18:22:12Z) - Online Decision Mediation [72.80902932543474]
Consider learning a decision support assistant to serve as an intermediary between (oracle) expert behavior and (imperfect) human behavior.
In clinical diagnosis, fully-autonomous machine behavior is often beyond ethical affordances.
arXiv Detail & Related papers (2023-10-28T05:59:43Z) - UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset.
Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z) - Detecting The Corruption Of Online Questionnaires By Artificial
Intelligence [1.9458156037869137]
This study tested if text generated by an AI for the purpose of an online study can be detected by both humans and automatic AI detection systems.
Humans were able to correctly identify authorship of text above chance level.
But their performance was still below what would be required to ensure satisfactory data quality.
arXiv Detail & Related papers (2023-08-14T23:47:56Z) - Phonetic and Prosody-aware Self-supervised Learning Approach for
Non-native Fluency Scoring [13.817385516193445]
Speech fluency/disfluency can be evaluated by analyzing a range of phonetic and prosodic features.
Deep neural networks are commonly trained to map fluency-related features into the human scores.
We introduce a self-supervised learning (SSL) approach that takes into account phonetic and prosody awareness for fluency scoring.
arXiv Detail & Related papers (2023-05-19T05:39:41Z) - Human-Centric Multimodal Machine Learning: Recent Advances and Testbed
on AI-based Recruitment [66.91538273487379]
There is a certain consensus about the need to develop AI applications with a Human-Centric approach.
Human-Centric Machine Learning needs to be developed based on four main requirements: (i) utility and social good; (ii) privacy and data ownership; (iii) transparency and accountability; and (iv) fairness in AI-driven decision-making processes.
We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data.
arXiv Detail & Related papers (2023-02-13T16:44:44Z) - Behave-XAI: Deep Explainable Learning of Behavioral Representational Data [0.0]
We use explainable or human understandable AI for a behavioral mining scenario.
We first formulate the behavioral mining problem in deep convolutional neural network architecture.
Once the model is developed, explanations are presented in front of users.
arXiv Detail & Related papers (2022-12-30T18:08:48Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - Tribrid: Stance Classification with Neural Inconsistency Detection [9.150728831518459]
We study the problem of performing automatic stance classification on social media with neural architectures such as BERT.
We present a new neural architecture where the input also includes automatically generated negated perspectives over a given claim.
The model is jointly learned to make simultaneously multiple predictions, which can be used either to improve the classification of the original perspective or to filter out doubtful predictions.
arXiv Detail & Related papers (2021-09-14T08:13:03Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - Estimating and Improving Fairness with Adversarial Learning [65.99330614802388]
We propose an adversarial multi-task training strategy to simultaneously mitigate and detect bias in the deep learning-based medical image analysis system.
Specifically, we propose to add a discrimination module against bias and a critical module that predicts unfairness within the base classification model.
We evaluate our framework on a large-scale public-available skin lesion dataset.
arXiv Detail & Related papers (2021-03-07T03:10:32Z) - FairCVtest Demo: Understanding Bias in Multimodal Learning with a
Testbed in Fair Automatic Recruitment [79.23531577235887]
This demo shows the capacity of the Artificial Intelligence (AI) behind a recruitment tool to extract sensitive information from unstructured data.
Aditionally, the demo includes a new algorithm for discrimination-aware learning which eliminates sensitive information in our multimodal AI framework.
arXiv Detail & Related papers (2020-09-12T17:45:09Z) - Bias in Multimodal AI: Testbed for Fair Automatic Recruitment [73.85525896663371]
We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data.
We train automatic recruitment algorithms using a set of multimodal synthetic profiles consciously scored with gender and racial biases.
Our methodology and results show how to generate fairer AI-based tools in general, and in particular fairer automated recruitment systems.
arXiv Detail & Related papers (2020-04-15T15:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.