Confidence-weighted integration of human and machine judgments for superior decision-making
- URL: http://arxiv.org/abs/2408.08083v1
- Date: Thu, 15 Aug 2024 11:16:21 GMT
- Title: Confidence-weighted integration of human and machine judgments for superior decision-making
- Authors: Felipe Yáñez, Xiaoliang Luo, Omar Valerio Minero, Bradley C. Love,
- Abstract summary: Recent studies have shown that large language models (LLMs) can surpass humans in certain tasks.
We show that humans, despite performing worse than LLMs, can still add value when teamed with them.
A human and machine team can surpass each individual teammate when team members' confidence is well-calibrated.
- Score: 2.4217853168915475
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have emerged as powerful tools in various domains. Recent studies have shown that LLMs can surpass humans in certain tasks, such as predicting the outcomes of neuroscience studies. What role does this leave for humans in the overall decision process? One possibility is that humans, despite performing worse than LLMs, can still add value when teamed with them. A human and machine team can surpass each individual teammate when team members' confidence is well-calibrated and team members diverge in which tasks they find difficult (i.e., calibration and diversity are needed). We simplified and extended a Bayesian approach to combining judgments using a logistic regression framework that integrates confidence-weighted judgments for any number of team members. Using this straightforward method, we demonstrated in a neuroscience forecasting task that, even when humans were inferior to LLMs, their combination with one or more LLMs consistently improved team performance. Our hope is that this simple and effective strategy for integrating the judgments of humans and machines will lead to productive collaborations.
Related papers
- Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games [87.5673042805229]
How large language models balance self-interest and collective well-being is a critical challenge for ensuring alignment, robustness, and safe deployment.<n>We adapt a public goods game with institutional choice from behavioral economics, allowing us to observe how different LLMs navigate social dilemmas.<n>Surprisingly, we find that reasoning LLMs, such as the o1 series, struggle significantly with cooperation.
arXiv Detail & Related papers (2025-06-29T15:02:47Z) - Confident-Knowledge Diversity Drives Human-Human and Human-AI Free Discussion Synergy and Reveals Pure-AI Discussion Shortfalls [3.335241944417891]
We study whether large language models can replicate the synergistic gains observed in human discussion.<n>We introduce an agent-agnostic confident-knowledge framework that models each participant by performance (accuracy) and confidence.<n>This framework quantifies confident-knowledge diversity, the degree to which one agent tends to be correct when another is uncertain, and yields a conservative upper bound on gains achievable via confidence-informed decisions.
arXiv Detail & Related papers (2025-06-15T05:09:20Z) - SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat [76.48873047003943]
We propose SPARTA ALIGNMENT, an algorithm to collectively align multiple LLMs through competition and combat.<n>For each iteration, one instruction and two models are selected for a duel, the other models evaluate the two responses, and their evaluation scores are aggregated through a adapted elo-ranking based reputation system.<n>The peer-evaluated combat results then become preference pairs where the winning response is preferred over the losing one, and all models learn from these preferences at the end of each iteration.
arXiv Detail & Related papers (2025-06-05T07:51:23Z) - Humans expect rationality and cooperation from LLM opponents in strategic games [0.0]
We present the results of the first monetarily-incentivised laboratory experiment looking at differences in human behaviour.<n>We show that, in this environment, human subjects choose significantly lower numbers when playing against LLMs than humans.<n>This shift is mainly driven by subjects with high strategic reasoning ability.
arXiv Detail & Related papers (2025-05-16T09:01:09Z) - Measurement of LLM's Philosophies of Human Nature [113.47929131143766]
We design the standardized psychological scale specifically targeting large language models (LLM)
We show that current LLMs exhibit a systemic lack of trust in humans.
We propose a mental loop learning framework, which enables LLM to continuously optimize its value system.
arXiv Detail & Related papers (2025-04-03T06:22:19Z) - Aligning Black-box Language Models with Human Judgments [8.30794246257544]
Large language models (LLMs) are increasingly used as automated judges to evaluate recommendation systems, search engines, and other subjective tasks.
We propose a framework to align LLM judgments with individual human evaluators or their aggregated judgments.
Our approach achieves over 142% average improvement in agreement across 29 tasks with only a small number of calibration examples used for training.
arXiv Detail & Related papers (2025-02-07T15:19:40Z) - Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration [50.657070334404835]
Collaborative Gym is a framework enabling asynchronous, tripartite interaction among agents, humans, and task environments.<n>We instantiate Co-Gym with three representative tasks in both simulated and real-world conditions.<n>Our findings reveal that collaborative agents consistently outperform their fully autonomous counterparts in task performance.
arXiv Detail & Related papers (2024-12-20T09:21:15Z) - Two Heads Are Better Than One: Collaborative LLM Embodied Agents for Human-Robot Interaction [1.6574413179773757]
Large language models (LLMs) should be able to leverage their large breadth of understanding to interpret natural language commands.
However, these models suffer from hallucinations, which may cause safety issues or deviations from the task.
In this research, multiple collaborative AI systems were tested against a single independent AI agent to determine whether the success in other domains would translate into improved human-robot interaction performance.
arXiv Detail & Related papers (2024-11-23T02:47:12Z) - Judgment of Learning: A Human Ability Beyond Generative Artificial Intelligence [0.0]
Large language models (LLMs) increasingly mimic human cognition in various language-based tasks.
We introduce a cross-agent prediction model to assess whether ChatGPT-based LLMs align with human judgments of learning (JOL)
Our results revealed that while human JOL reliably predicted actual memory performance, none of the tested LLMs demonstrated comparable predictive accuracy.
arXiv Detail & Related papers (2024-10-17T09:42:30Z) - Large Language Models Overcome the Machine Penalty When Acting Fairly but Not When Acting Selfishly or Altruistically [14.576971868730709]
In social dilemmas where the collective and self-interests are at odds, people typically cooperate less with machines than with fellow humans.
In this study, we explore the possibility of closing this research question by using Large Language Models (LLMs)
Our findings reveal that, when interacting with humans, fair LLMs are able to induce cooperation levels comparable to those observed in human-human interactions.
arXiv Detail & Related papers (2024-09-29T10:11:25Z) - Human-AI collectives produce the most accurate differential diagnoses [0.0]
We show that hybrid collectives of physicians and large language models (LLMs) outperform both single physicians and physician collectives.
Our approach highlights the potential for collective human and machine intelligence to improve accuracy in complex, open-ended domains like medical diagnostics.
arXiv Detail & Related papers (2024-06-21T08:46:30Z) - Mixed-Initiative Human-Robot Teaming under Suboptimality with Online Bayesian Adaptation [0.6591036379613505]
We develop computational modeling and optimization techniques for enhancing the performance of suboptimal human-agent teams.
We adopt an online Bayesian approach that enables a robot to infer people's willingness to comply with its assistance in a sequential decision-making game.
Our user studies show that user preferences and team performance indeed vary with robot intervention styles.
arXiv Detail & Related papers (2024-03-24T14:38:18Z) - Optimizing Risk-averse Human-AI Hybrid Teams [1.433758865948252]
We propose a manager which learns, through a standard Reinforcement Learning scheme, how to best delegate.
We demonstrate the optimality of our manager's performance in several grid environments.
Our results show our manager can successfully learn desirable delegations which result in team paths near/exactly optimal.
arXiv Detail & Related papers (2024-03-13T09:49:26Z) - Large language models surpass human experts in predicting neuroscience results [60.26891446026707]
Large language models (LLMs) forecast novel results better than human experts.
BrainBench is a benchmark for predicting neuroscience results.
Our approach is not neuroscience-specific and is transferable to other knowledge-intensive endeavors.
arXiv Detail & Related papers (2024-03-04T15:27:59Z) - Large Language Model-based Human-Agent Collaboration for Complex Task
Solving [94.3914058341565]
We introduce the problem of Large Language Models (LLMs)-based human-agent collaboration for complex task-solving.
We propose a Reinforcement Learning-based Human-Agent Collaboration method, ReHAC.
This approach includes a policy model designed to determine the most opportune stages for human intervention within the task-solving process.
arXiv Detail & Related papers (2024-02-20T11:03:36Z) - Limits of Large Language Models in Debating Humans [0.0]
Large Language Models (LLMs) have shown remarkable promise in their ability to interact proficiently with humans.
This paper endeavors to test the limits of current-day LLMs with a pre-registered study integrating real people with LLM agents acting as people.
arXiv Detail & Related papers (2024-02-06T03:24:27Z) - Human-Instruction-Free LLM Self-Alignment with Limited Samples [64.69906311787055]
We propose an algorithm that can self-align large language models (LLMs) iteratively without active human involvement.
Unlike existing works, our algorithm relies on neither human-crafted instructions nor labeled rewards, significantly reducing human involvement.
We show that our method can unlock the LLMs' self-generalization ability to perform alignment with near-zero human supervision.
arXiv Detail & Related papers (2024-01-06T14:00:12Z) - MAgIC: Investigation of Large Language Model Powered Multi-Agent in
Cognition, Adaptability, Rationality and Collaboration [102.41118020705876]
Large Language Models (LLMs) have marked a significant advancement in the field of natural language processing.
As their applications extend into multi-agent environments, a need has arisen for a comprehensive evaluation framework.
This work introduces a novel benchmarking framework specifically tailored to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z) - Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration [83.4031923134958]
Corex is a suite of novel general-purpose strategies that transform Large Language Models into autonomous agents.
Inspired by human behaviors, Corex is constituted by diverse collaboration paradigms including Debate, Review, and Retrieve modes.
We demonstrate that orchestrating multiple LLMs to work in concert yields substantially better performance compared to existing methods.
arXiv Detail & Related papers (2023-09-30T07:11:39Z) - Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration [116.09561564489799]
Solo Performance Prompting transforms a single LLM into a cognitive synergist by engaging in multi-turn self-collaboration with multiple personas.
A cognitive synergist is an intelligent agent that collaboratively combines multiple minds' strengths and knowledge to enhance problem-solving in complex tasks.
Our in-depth analysis shows that assigning multiple fine-grained personas in LLMs improves problem-solving abilities compared to using a single or fixed number of personas.
arXiv Detail & Related papers (2023-07-11T14:45:19Z) - Among Us: Adversarially Robust Collaborative Perception by Consensus [50.73128191202585]
Multiple robots could perceive a scene (e.g., detect objects) collaboratively better than individuals.
We propose ROBOSAC, a novel sampling-based defense strategy generalizable to unseen attackers.
We validate our method on the task of collaborative 3D object detection in autonomous driving scenarios.
arXiv Detail & Related papers (2023-03-16T17:15:25Z) - Kill Chaos with Kindness: Agreeableness Improves Team Performance Under
Uncertainty [0.0]
Agreeableness has demonstrated a non-significant and highly variable relationship with team performance.
An agent-based model (ABM) is used to predict the effects of personality traits on teamwork.
A genetic algorithm is then used to explore the limits of the ABM in order to discover which traits correlate with best and worst performing teams.
arXiv Detail & Related papers (2022-08-09T16:04:32Z) - Human Decision Makings on Curriculum Reinforcement Learning with
Difficulty Adjustment [52.07473934146584]
We guide the curriculum reinforcement learning results towards a preferred performance level that is neither too hard nor too easy via learning from the human decision process.
Our system is highly parallelizable, making it possible for a human to train large-scale reinforcement learning applications.
It shows reinforcement learning performance can successfully adjust in sync with the human desired difficulty level.
arXiv Detail & Related papers (2022-08-04T23:53:51Z) - Learning to Complement Humans [67.38348247794949]
A rising vision for AI in the open world centers on the development of systems that can complement humans for perceptual, diagnostic, and reasoning tasks.
We demonstrate how an end-to-end learning strategy can be harnessed to optimize the combined performance of human-machine teams.
arXiv Detail & Related papers (2020-05-01T20:00:23Z) - Is the Most Accurate AI the Best Teammate? Optimizing AI for Teamwork [54.309495231017344]
We argue that AI systems should be trained in a human-centered manner, directly optimized for team performance.
We study this proposal for a specific type of human-AI teaming, where the human overseer chooses to either accept the AI recommendation or solve the task themselves.
Our experiments with linear and non-linear models on real-world, high-stakes datasets show that the most accuracy AI may not lead to highest team performance.
arXiv Detail & Related papers (2020-04-27T19:06:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.