Related papers: Confidence-weighted integration of human and machine judgments for superior decision-making

Confidence-weighted integration of human and machine judgments for superior decision-making

URL: http://arxiv.org/abs/2408.08083v1
Date: Thu, 15 Aug 2024 11:16:21 GMT
Title: Confidence-weighted integration of human and machine judgments for superior decision-making
Authors: Felipe Yáñez, Xiaoliang Luo, Omar Valerio Minero, Bradley C. Love,
Abstract summary: Recent studies have shown that large language models (LLMs) can surpass humans in certain tasks. We show that humans, despite performing worse than LLMs, can still add value when teamed with them. A human and machine team can surpass each individual teammate when team members' confidence is well-calibrated.
Score: 2.4217853168915475
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have emerged as powerful tools in various domains. Recent studies have shown that LLMs can surpass humans in certain tasks, such as predicting the outcomes of neuroscience studies. What role does this leave for humans in the overall decision process? One possibility is that humans, despite performing worse than LLMs, can still add value when teamed with them. A human and machine team can surpass each individual teammate when team members' confidence is well-calibrated and team members diverge in which tasks they find difficult (i.e., calibration and diversity are needed). We simplified and extended a Bayesian approach to combining judgments using a logistic regression framework that integrates confidence-weighted judgments for any number of team members. Using this straightforward method, we demonstrated in a neuroscience forecasting task that, even when humans were inferior to LLMs, their combination with one or more LLMs consistently improved team performance. Our hope is that this simple and effective strategy for integrating the judgments of humans and machines will lead to productive collaborations.

Related papers

Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games [87.5673042805229]
How large language models balance self-interest and collective well-being is a critical challenge for ensuring alignment, robustness, and safe deployment.<n>We adapt a public goods game with institutional choice from behavioral economics, allowing us to observe how different LLMs navigate social dilemmas.<n>Surprisingly, we find that reasoning LLMs, such as the o1 series, struggle significantly with cooperation.
arXiv Detail & Related papers (2025-06-29T15:02:47Z)
Humans expect rationality and cooperation from LLM opponents in strategic games [0.0]
We present the results of the first monetarily-incentivised laboratory experiment looking at differences in human behaviour.<n>We show that, in this environment, human subjects choose significantly lower numbers when playing against LLMs than humans.<n>This shift is mainly driven by subjects with high strategic reasoning ability.
arXiv Detail & Related papers (2025-05-16T09:01:09Z)
Measurement of LLM's Philosophies of Human Nature [113.47929131143766]
We design the standardized psychological scale specifically targeting large language models (LLM) We show that current LLMs exhibit a systemic lack of trust in humans. We propose a mental loop learning framework, which enables LLM to continuously optimize its value system.
arXiv Detail & Related papers (2025-04-03T06:22:19Z)
Aligning Black-box Language Models with Human Judgments [8.30794246257544]
Large language models (LLMs) are increasingly used as automated judges to evaluate recommendation systems, search engines, and other subjective tasks. We propose a framework to align LLM judgments with individual human evaluators or their aggregated judgments. Our approach achieves over 142% average improvement in agreement across 29 tasks with only a small number of calibration examples used for training.
arXiv Detail & Related papers (2025-02-07T15:19:40Z)
Two Heads Are Better Than One: Collaborative LLM Embodied Agents for Human-Robot Interaction [1.6574413179773757]
Large language models (LLMs) should be able to leverage their large breadth of understanding to interpret natural language commands. However, these models suffer from hallucinations, which may cause safety issues or deviations from the task. In this research, multiple collaborative AI systems were tested against a single independent AI agent to determine whether the success in other domains would translate into improved human-robot interaction performance.
arXiv Detail & Related papers (2024-11-23T02:47:12Z)
Judgment of Learning: A Human Ability Beyond Generative Artificial Intelligence [0.0]
Large language models (LLMs) increasingly mimic human cognition in various language-based tasks. We introduce a cross-agent prediction model to assess whether ChatGPT-based LLMs align with human judgments of learning (JOL) Our results revealed that while human JOL reliably predicted actual memory performance, none of the tested LLMs demonstrated comparable predictive accuracy.
arXiv Detail & Related papers (2024-10-17T09:42:30Z)
Large Language Models Overcome the Machine Penalty When Acting Fairly but Not When Acting Selfishly or Altruistically [14.576971868730709]
In social dilemmas where the collective and self-interests are at odds, people typically cooperate less with machines than with fellow humans. In this study, we explore the possibility of closing this research question by using Large Language Models (LLMs) Our findings reveal that, when interacting with humans, fair LLMs are able to induce cooperation levels comparable to those observed in human-human interactions.
arXiv Detail & Related papers (2024-09-29T10:11:25Z)
Human-AI collectives produce the most accurate differential diagnoses [0.0]
We show that hybrid collectives of physicians and large language models (LLMs) outperform both single physicians and physician collectives. Our approach highlights the potential for collective human and machine intelligence to improve accuracy in complex, open-ended domains like medical diagnostics.
arXiv Detail & Related papers (2024-06-21T08:46:30Z)
Large language models surpass human experts in predicting neuroscience results [60.26891446026707]
Large language models (LLMs) forecast novel results better than human experts. BrainBench is a benchmark for predicting neuroscience results. Our approach is not neuroscience-specific and is transferable to other knowledge-intensive endeavors.
arXiv Detail & Related papers (2024-03-04T15:27:59Z)
Large Language Model-based Human-Agent Collaboration for Complex Task Solving [94.3914058341565]
We introduce the problem of Large Language Models (LLMs)-based human-agent collaboration for complex task-solving. We propose a Reinforcement Learning-based Human-Agent Collaboration method, ReHAC. This approach includes a policy model designed to determine the most opportune stages for human intervention within the task-solving process.
arXiv Detail & Related papers (2024-02-20T11:03:36Z)
Limits of Large Language Models in Debating Humans [0.0]
Large Language Models (LLMs) have shown remarkable promise in their ability to interact proficiently with humans. This paper endeavors to test the limits of current-day LLMs with a pre-registered study integrating real people with LLM agents acting as people.
arXiv Detail & Related papers (2024-02-06T03:24:27Z)
Human-Instruction-Free LLM Self-Alignment with Limited Samples [64.69906311787055]
We propose an algorithm that can self-align large language models (LLMs) iteratively without active human involvement. Unlike existing works, our algorithm relies on neither human-crafted instructions nor labeled rewards, significantly reducing human involvement. We show that our method can unlock the LLMs' self-generalization ability to perform alignment with near-zero human supervision.
arXiv Detail & Related papers (2024-01-06T14:00:12Z)
MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration [102.41118020705876]
Large Language Models (LLMs) have marked a significant advancement in the field of natural language processing. As their applications extend into multi-agent environments, a need has arisen for a comprehensive evaluation framework. This work introduces a novel benchmarking framework specifically tailored to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z)
Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration [83.4031923134958]
Corex is a suite of novel general-purpose strategies that transform Large Language Models into autonomous agents. Inspired by human behaviors, Corex is constituted by diverse collaboration paradigms including Debate, Review, and Retrieve modes. We demonstrate that orchestrating multiple LLMs to work in concert yields substantially better performance compared to existing methods.
arXiv Detail & Related papers (2023-09-30T07:11:39Z)
Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration [116.09561564489799]
Solo Performance Prompting transforms a single LLM into a cognitive synergist by engaging in multi-turn self-collaboration with multiple personas. A cognitive synergist is an intelligent agent that collaboratively combines multiple minds' strengths and knowledge to enhance problem-solving in complex tasks. Our in-depth analysis shows that assigning multiple fine-grained personas in LLMs improves problem-solving abilities compared to using a single or fixed number of personas.
arXiv Detail & Related papers (2023-07-11T14:45:19Z)
Kill Chaos with Kindness: Agreeableness Improves Team Performance Under Uncertainty [0.0]
Agreeableness has demonstrated a non-significant and highly variable relationship with team performance. An agent-based model (ABM) is used to predict the effects of personality traits on teamwork. A genetic algorithm is then used to explore the limits of the ABM in order to discover which traits correlate with best and worst performing teams.
arXiv Detail & Related papers (2022-08-09T16:04:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.