Debating with More Persuasive LLMs Leads to More Truthful Answers
- URL: http://arxiv.org/abs/2402.06782v4
- Date: Thu, 25 Jul 2024 23:32:21 GMT
- Title: Debating with More Persuasive LLMs Leads to More Truthful Answers
- Authors: Akbir Khan, John Hughes, Dan Valentine, Laura Ruis, Kshitij Sachan, Ansh Radhakrishnan, Edward Grefenstette, Samuel R. Bowman, Tim Rocktäschel, Ethan Perez,
- Abstract summary: We find that debate consistently helps both non-expert models and humans answer questions, achieving 76% and 88% accuracy respectively.
Our results provide encouraging empirical evidence for the viability of aligning models with debate in the absence of ground truth.
- Score: 45.0343254517401
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Common methods for aligning large language models (LLMs) with desired behaviour heavily rely on human-labelled data. However, as models grow increasingly sophisticated, they will surpass human expertise, and the role of human evaluation will evolve into non-experts overseeing experts. In anticipation of this, we ask: can weaker models assess the correctness of stronger models? We investigate this question in an analogous setting, where stronger models (experts) possess the necessary information to answer questions and weaker models (non-experts) lack this information. The method we evaluate is debate, where two LLM experts each argue for a different answer, and a non-expert selects the answer. We find that debate consistently helps both non-expert models and humans answer questions, achieving 76% and 88% accuracy respectively (naive baselines obtain 48% and 60%). Furthermore, optimising expert debaters for persuasiveness in an unsupervised manner improves non-expert ability to identify the truth in debates. Our results provide encouraging empirical evidence for the viability of aligning models with debate in the absence of ground truth.
Related papers
- Teaching Models to Balance Resisting and Accepting Persuasion [69.68379406317682]
Large language models (LLMs) are susceptible to persuasion, which can pose risks when models are faced with an adversarial interlocutor.
We show that optimizing models for only one side results in poor performance on the other.
In order to balance positive and negative persuasion, we introduce Persuasion-Balanced Training (or PBT)
arXiv Detail & Related papers (2024-10-18T16:49:36Z) - Training Language Models to Win Debates with Self-Play Improves Judge Accuracy [8.13173791334223]
We test the robustness of debate as a method of scalable oversight by training models to debate with data generated via self-play.
We find that language model based evaluators answer questions more accurately when judging models optimized to win debates.
arXiv Detail & Related papers (2024-09-25T05:28:33Z) - On scalable oversight with weak LLMs judging strong LLMs [67.8628575615614]
We study debate, where two AI's compete to convince a judge; consultancy, where a single AI tries to convince a judge that asks questions.
We use large language models (LLMs) as both AI agents and as stand-ins for human judges, taking the judge models to be weaker than agent models.
arXiv Detail & Related papers (2024-07-05T16:29:15Z) - Debate Helps Supervise Unreliable Experts [33.03555781137954]
We show that debate between two unreliable experts can help a non-expert judge more reliably identify the truth.
Comparing debate to a baseline we call consultancy, where a single expert argues for only one answer which is correct half of the time, we find that debate performs significantly better.
These results show that debate is a promising approach for supervising increasingly capable but potentially unreliable AI systems.
arXiv Detail & Related papers (2023-11-15T05:05:40Z) - The ART of LLM Refinement: Ask, Refine, and Trust [85.75059530612882]
We propose a reasoning with refinement objective called ART: Ask, Refine, and Trust.
It asks necessary questions to decide when an LLM should refine its output.
It achieves a performance gain of +5 points over self-refinement baselines.
arXiv Detail & Related papers (2023-11-14T07:26:32Z) - Ask an Expert: Leveraging Language Models to Improve Strategic Reasoning
in Goal-Oriented Dialogue Models [15.476899850339395]
We propose the "Ask an Expert" framework in which the model is trained with access to an "expert" which it can consult at each turn.
Advice is solicited via a structured dialogue with the expert, and the model is optimized to selectively utilize (or ignore) it given the context and dialogue history.
We evaluate this framework in a mental health support domain, where the structure of the expert conversation is outlined by pre-specified prompts which reflect a reasoning strategy taught to practitioners in the field.
arXiv Detail & Related papers (2023-05-29T04:19:35Z) - Getting MoRE out of Mixture of Language Model Reasoning Experts [71.61176122960464]
We propose a Mixture-of-Reasoning-Experts (MoRE) framework that ensembles diverse specialized language models.
We specialize the backbone language model with prompts optimized for different reasoning categories, including factual, multihop, mathematical, and commonsense reasoning.
Our human study confirms that presenting expert predictions and the answer selection process helps annotators more accurately calibrate when to trust the system's output.
arXiv Detail & Related papers (2023-05-24T02:00:51Z) - Are Metrics Enough? Guidelines for Communicating and Visualizing
Predictive Models to Subject Matter Experts [7.768301998812552]
We describe an iterative study conducted with both subject matter experts and data scientists to understand the gaps in communication.
We derive a set of communication guidelines that use visualization as a common medium for communicating the strengths and weaknesses of a model.
arXiv Detail & Related papers (2022-05-11T19:40:24Z) - Generative Context Pair Selection for Multi-hop Question Answering [60.74354009152721]
We propose a generative context selection model for multi-hop question answering.
Our proposed generative passage selection model has a better performance (4.9% higher than baseline) on adversarial held-out set.
arXiv Detail & Related papers (2021-04-18T07:00:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.