Training Language Models to Win Debates with Self-Play Improves Judge Accuracy
- URL: http://arxiv.org/abs/2409.16636v1
- Date: Wed, 25 Sep 2024 05:28:33 GMT
- Title: Training Language Models to Win Debates with Self-Play Improves Judge Accuracy
- Authors: Samuel Arnesen, David Rein, Julian Michael,
- Abstract summary: We test the robustness of debate as a method of scalable oversight by training models to debate with data generated via self-play.
We find that language model based evaluators answer questions more accurately when judging models optimized to win debates.
- Score: 8.13173791334223
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We test the robustness of debate as a method of scalable oversight by training models to debate with data generated via self-play. In a long-context reading comprehension task, we find that language model based evaluators answer questions more accurately when judging models optimized to win debates. By contrast, we find no such relationship for consultancy models trained to persuade a judge without an opposing debater present. In quantitative and qualitative comparisons between our debate models and novel consultancy baselines, we find evidence that debate training encourages stronger and more informative arguments, showing promise that it can help provide high-quality supervision for tasks that are difficult to directly evaluate.
Related papers
- ACC-Debate: An Actor-Critic Approach to Multi-Agent Debate [20.040543142468344]
We propose ACC-Debate, an Actor-Critic based learning framework to produce a two-agent team specialized in debate.
We demonstrate that ACC-Debate outperforms SotA debate techniques on a wide array of benchmarks.
arXiv Detail & Related papers (2024-10-30T19:09:02Z) - On scalable oversight with weak LLMs judging strong LLMs [67.8628575615614]
We study debate, where two AI's compete to convince a judge; consultancy, where a single AI tries to convince a judge that asks questions.
We use large language models (LLMs) as both AI agents and as stand-ins for human judges, taking the judge models to be weaker than agent models.
arXiv Detail & Related papers (2024-07-05T16:29:15Z) - Debatrix: Multi-dimensional Debate Judge with Iterative Chronological Analysis Based on LLM [51.43102092480804]
Debatrix is an automated debate judge based on Large Language Models (LLMs)
To align with real-world debate scenarios, we introduced the PanelBench benchmark, comparing our system's performance to actual debate outcomes.
The findings indicate a notable enhancement over directly using LLMs for debate evaluation.
arXiv Detail & Related papers (2024-03-12T18:19:47Z) - Debating with More Persuasive LLMs Leads to More Truthful Answers [45.0343254517401]
We find that debate consistently helps both non-expert models and humans answer questions, achieving 76% and 88% accuracy respectively.
Our results provide encouraging empirical evidence for the viability of aligning models with debate in the absence of ground truth.
arXiv Detail & Related papers (2024-02-09T21:05:01Z) - SAIE Framework: Support Alone Isn't Enough -- Advancing LLM Training
with Adversarial Remarks [47.609417223514605]
This work introduces the SAIE framework, which facilitates supportive and adversarial discussions between learner and partner models.
Our empirical evaluation shows that models fine-tuned with the SAIE framework outperform those trained with conventional fine-tuning approaches.
arXiv Detail & Related papers (2023-11-14T12:12:25Z) - Explaining Image Classification with Visual Debates [26.76139301708958]
We propose a novel debate framework for understanding and explaining a continuous image classifier's reasoning for making a particular prediction.
Our framework encourages players to put forward diverse arguments during the debates, picking up the reasoning trails missed by their opponents.
We demonstrate and evaluate (a practical realization) our Visual Debates on the geometric SHAPE and MNIST datasets.
arXiv Detail & Related papers (2022-10-17T12:35:52Z) - Don't Copy the Teacher: Data and Model Challenges in Embodied Dialogue [92.01165203498299]
Embodied dialogue instruction following requires an agent to complete a complex sequence of tasks from a natural language exchange.
This paper argues that imitation learning (IL) and related low-level metrics are actually misleading and do not align with the goals of embodied dialogue research.
arXiv Detail & Related papers (2022-10-10T05:51:40Z) - High Quality Real-Time Structured Debate Generation [0.0]
We define debate trees and paths for generating debates while enforcing a high level structure and grammar.
We leverage a large corpus of tree-structured debates that have metadata associated with each argument.
Our results demonstrate the ability to generate debates in real-time on complex topics at a quality that is close to humans.
arXiv Detail & Related papers (2020-12-01T01:39:38Z) - Knowledge-Grounded Dialogue Generation with Pre-trained Language Models [74.09352261943911]
We study knowledge-grounded dialogue generation with pre-trained language models.
We propose equipping response generation defined by a pre-trained language model with a knowledge selection module.
arXiv Detail & Related papers (2020-10-17T16:49:43Z) - Learning an Effective Context-Response Matching Model with
Self-Supervised Tasks for Retrieval-based Dialogues [88.73739515457116]
We introduce four self-supervised tasks including next session prediction, utterance restoration, incoherence detection and consistency discrimination.
We jointly train the PLM-based response selection model with these auxiliary tasks in a multi-task manner.
Experiment results indicate that the proposed auxiliary self-supervised tasks bring significant improvement for multi-turn response selection.
arXiv Detail & Related papers (2020-09-14T08:44:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.