Debate Helps Supervise Unreliable Experts
- URL: http://arxiv.org/abs/2311.08702v1
- Date: Wed, 15 Nov 2023 05:05:40 GMT
- Title: Debate Helps Supervise Unreliable Experts
- Authors: Julian Michael, Salsabila Mahdi, David Rein, Jackson Petty, Julien
Dirani, Vishakh Padmakumar, Samuel R. Bowman
- Abstract summary: We show that debate between two unreliable experts can help a non-expert judge more reliably identify the truth.
Comparing debate to a baseline we call consultancy, where a single expert argues for only one answer which is correct half of the time, we find that debate performs significantly better.
These results show that debate is a promising approach for supervising increasingly capable but potentially unreliable AI systems.
- Score: 33.03555781137954
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As AI systems are used to answer more difficult questions and potentially
help create new knowledge, judging the truthfulness of their outputs becomes
more difficult and more important. How can we supervise unreliable experts,
which have access to the truth but may not accurately report it, to give
answers that are systematically true and don't just superficially seem true,
when the supervisor can't tell the difference between the two on their own? In
this work, we show that debate between two unreliable experts can help a
non-expert judge more reliably identify the truth. We collect a dataset of
human-written debates on hard reading comprehension questions where the judge
has not read the source passage, only ever seeing expert arguments and short
quotes selectively revealed by 'expert' debaters who have access to the
passage. In our debates, one expert argues for the correct answer, and the
other for an incorrect answer. Comparing debate to a baseline we call
consultancy, where a single expert argues for only one answer which is correct
half of the time, we find that debate performs significantly better, with 84%
judge accuracy compared to consultancy's 74%. Debates are also more efficient,
being 68% of the length of consultancies. By comparing human to AI debaters, we
find evidence that with more skilled (in this case, human) debaters, the
performance of debate goes up but the performance of consultancy goes down. Our
error analysis also supports this trend, with 46% of errors in human debate
attributable to mistakes by the honest debater (which should go away with
increased skill); whereas 52% of errors in human consultancy are due to
debaters obfuscating the relevant evidence from the judge (which should become
worse with increased skill). Overall, these results show that debate is a
promising approach for supervising increasingly capable but potentially
unreliable AI systems.
Related papers
- Training Language Models to Win Debates with Self-Play Improves Judge Accuracy [8.13173791334223]
We test the robustness of debate as a method of scalable oversight by training models to debate with data generated via self-play.
We find that language model based evaluators answer questions more accurately when judging models optimized to win debates.
arXiv Detail & Related papers (2024-09-25T05:28:33Z) - On scalable oversight with weak LLMs judging strong LLMs [67.8628575615614]
We study debate, where two AI's compete to convince a judge; consultancy, where a single AI tries to convince a judge that asks questions.
We use large language models (LLMs) as both AI agents and as stand-ins for human judges, taking the judge models to be weaker than agent models.
arXiv Detail & Related papers (2024-07-05T16:29:15Z) - Debatrix: Multi-dimensional Debate Judge with Iterative Chronological Analysis Based on LLM [51.43102092480804]
Debatrix is an automated debate judge based on Large Language Models (LLMs)
To align with real-world debate scenarios, we introduced the PanelBench benchmark, comparing our system's performance to actual debate outcomes.
The findings indicate a notable enhancement over directly using LLMs for debate evaluation.
arXiv Detail & Related papers (2024-03-12T18:19:47Z) - Debating with More Persuasive LLMs Leads to More Truthful Answers [45.0343254517401]
We find that debate consistently helps both non-expert models and humans answer questions, achieving 76% and 88% accuracy respectively.
Our results provide encouraging empirical evidence for the viability of aligning models with debate in the absence of ground truth.
arXiv Detail & Related papers (2024-02-09T21:05:01Z) - Solving NLP Problems through Human-System Collaboration: A
Discussion-based Approach [98.13835740351932]
This research aims to create a dataset and computational framework for systems that discuss and refine their predictions through dialogue.
We show that the proposed system can have beneficial discussions with humans improving the accuracy by up to 25 points in the natural language inference task.
arXiv Detail & Related papers (2023-05-19T16:24:50Z) - Two-Turn Debate Doesn't Help Humans Answer Hard Reading Comprehension
Questions [26.404441861051875]
We assess whether presenting humans with arguments for two competing answer options allows human judges to perform more accurately.
Previous research has shown that just a single turn of arguments in this format is not helpful to humans.
We find that, regardless of whether they have access to arguments or not, humans perform similarly on our task.
arXiv Detail & Related papers (2022-10-19T19:48:50Z) - Persua: A Visual Interactive System to Enhance the Persuasiveness of
Arguments in Online Discussion [52.49981085431061]
Enhancing people's ability to write persuasive arguments could contribute to the effectiveness and civility in online communication.
We derived four design goals for a tool that helps users improve the persuasiveness of arguments in online discussions.
Persua is an interactive visual system that provides example-based guidance on persuasive strategies to enhance the persuasiveness of arguments.
arXiv Detail & Related papers (2022-04-16T08:07:53Z) - DebateSum: A large-scale argument mining and summarization dataset [0.0]
DebateSum consists of 187,386 unique pieces of evidence with corresponding argument and extractive summaries.
We train several transformer summarization models to benchmark summarization performance on DebateSum.
We present a search engine for this dataset which is utilized extensively by members of the National Speech and Debate Association.
arXiv Detail & Related papers (2020-11-14T10:06:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.