Confidence-weighted integration of human and machine judgments for superior decision-making
- URL: http://arxiv.org/abs/2408.08083v1
- Date: Thu, 15 Aug 2024 11:16:21 GMT
- Title: Confidence-weighted integration of human and machine judgments for superior decision-making
- Authors: Felipe Yáñez, Xiaoliang Luo, Omar Valerio Minero, Bradley C. Love,
- Abstract summary: Recent studies have shown that large language models (LLMs) can surpass humans in certain tasks.
We show that humans, despite performing worse than LLMs, can still add value when teamed with them.
A human and machine team can surpass each individual teammate when team members' confidence is well-calibrated.
- Score: 2.4217853168915475
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have emerged as powerful tools in various domains. Recent studies have shown that LLMs can surpass humans in certain tasks, such as predicting the outcomes of neuroscience studies. What role does this leave for humans in the overall decision process? One possibility is that humans, despite performing worse than LLMs, can still add value when teamed with them. A human and machine team can surpass each individual teammate when team members' confidence is well-calibrated and team members diverge in which tasks they find difficult (i.e., calibration and diversity are needed). We simplified and extended a Bayesian approach to combining judgments using a logistic regression framework that integrates confidence-weighted judgments for any number of team members. Using this straightforward method, we demonstrated in a neuroscience forecasting task that, even when humans were inferior to LLMs, their combination with one or more LLMs consistently improved team performance. Our hope is that this simple and effective strategy for integrating the judgments of humans and machines will lead to productive collaborations.
Related papers
- Two Heads Are Better Than One: Collaborative LLM Embodied Agents for Human-Robot Interaction [1.6574413179773757]
Large language models (LLMs) should be able to leverage their large breadth of understanding to interpret natural language commands.
However, these models suffer from hallucinations, which may cause safety issues or deviations from the task.
In this research, multiple collaborative AI systems were tested against a single independent AI agent to determine whether the success in other domains would translate into improved human-robot interaction performance.
arXiv Detail & Related papers (2024-11-23T02:47:12Z) - Large Language Models Overcome the Machine Penalty When Acting Fairly but Not When Acting Selfishly or Altruistically [14.576971868730709]
In social dilemmas where the collective and self-interests are at odds, people typically cooperate less with machines than with fellow humans.
In this study, we explore the possibility of closing this research question by using Large Language Models (LLMs)
Our findings reveal that, when interacting with humans, fair LLMs are able to induce cooperation levels comparable to those observed in human-human interactions.
arXiv Detail & Related papers (2024-09-29T10:11:25Z) - Human-AI collectives produce the most accurate differential diagnoses [0.0]
We show that hybrid collectives of physicians and large language models (LLMs) outperform both single physicians and physician collectives.
Our approach highlights the potential for collective human and machine intelligence to improve accuracy in complex, open-ended domains like medical diagnostics.
arXiv Detail & Related papers (2024-06-21T08:46:30Z) - Large language models surpass human experts in predicting neuroscience results [60.26891446026707]
Large language models (LLMs) forecast novel results better than human experts.
BrainBench is a benchmark for predicting neuroscience results.
Our approach is not neuroscience-specific and is transferable to other knowledge-intensive endeavors.
arXiv Detail & Related papers (2024-03-04T15:27:59Z) - Large Language Model-based Human-Agent Collaboration for Complex Task
Solving [94.3914058341565]
We introduce the problem of Large Language Models (LLMs)-based human-agent collaboration for complex task-solving.
We propose a Reinforcement Learning-based Human-Agent Collaboration method, ReHAC.
This approach includes a policy model designed to determine the most opportune stages for human intervention within the task-solving process.
arXiv Detail & Related papers (2024-02-20T11:03:36Z) - Limits of Large Language Models in Debating Humans [0.0]
Large Language Models (LLMs) have shown remarkable promise in their ability to interact proficiently with humans.
This paper endeavors to test the limits of current-day LLMs with a pre-registered study integrating real people with LLM agents acting as people.
arXiv Detail & Related papers (2024-02-06T03:24:27Z) - Human-Instruction-Free LLM Self-Alignment with Limited Samples [64.69906311787055]
We propose an algorithm that can self-align large language models (LLMs) iteratively without active human involvement.
Unlike existing works, our algorithm relies on neither human-crafted instructions nor labeled rewards, significantly reducing human involvement.
We show that our method can unlock the LLMs' self-generalization ability to perform alignment with near-zero human supervision.
arXiv Detail & Related papers (2024-01-06T14:00:12Z) - MAgIC: Investigation of Large Language Model Powered Multi-Agent in
Cognition, Adaptability, Rationality and Collaboration [102.41118020705876]
Large Language Models (LLMs) have marked a significant advancement in the field of natural language processing.
As their applications extend into multi-agent environments, a need has arisen for a comprehensive evaluation framework.
This work introduces a novel benchmarking framework specifically tailored to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z) - Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration [83.4031923134958]
Corex is a suite of novel general-purpose strategies that transform Large Language Models into autonomous agents.
Inspired by human behaviors, Corex is constituted by diverse collaboration paradigms including Debate, Review, and Retrieve modes.
We demonstrate that orchestrating multiple LLMs to work in concert yields substantially better performance compared to existing methods.
arXiv Detail & Related papers (2023-09-30T07:11:39Z) - Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration [116.09561564489799]
Solo Performance Prompting transforms a single LLM into a cognitive synergist by engaging in multi-turn self-collaboration with multiple personas.
A cognitive synergist is an intelligent agent that collaboratively combines multiple minds' strengths and knowledge to enhance problem-solving in complex tasks.
Our in-depth analysis shows that assigning multiple fine-grained personas in LLMs improves problem-solving abilities compared to using a single or fixed number of personas.
arXiv Detail & Related papers (2023-07-11T14:45:19Z) - Kill Chaos with Kindness: Agreeableness Improves Team Performance Under
Uncertainty [0.0]
Agreeableness has demonstrated a non-significant and highly variable relationship with team performance.
An agent-based model (ABM) is used to predict the effects of personality traits on teamwork.
A genetic algorithm is then used to explore the limits of the ABM in order to discover which traits correlate with best and worst performing teams.
arXiv Detail & Related papers (2022-08-09T16:04:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.