Two-Turn Debate Doesn't Help Humans Answer Hard Reading Comprehension
Questions
- URL: http://arxiv.org/abs/2210.10860v1
- Date: Wed, 19 Oct 2022 19:48:50 GMT
- Title: Two-Turn Debate Doesn't Help Humans Answer Hard Reading Comprehension
Questions
- Authors: Alicia Parrish, Harsh Trivedi, Nikita Nangia, Vishakh Padmakumar,
Jason Phang, Amanpreet Singh Saimbhi, Samuel R. Bowman
- Abstract summary: We assess whether presenting humans with arguments for two competing answer options allows human judges to perform more accurately.
Previous research has shown that just a single turn of arguments in this format is not helpful to humans.
We find that, regardless of whether they have access to arguments or not, humans perform similarly on our task.
- Score: 26.404441861051875
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The use of language-model-based question-answering systems to aid humans in
completing difficult tasks is limited, in part, by the unreliability of the
text these systems generate. Using hard multiple-choice reading comprehension
questions as a testbed, we assess whether presenting humans with arguments for
two competing answer options, where one is correct and the other is incorrect,
allows human judges to perform more accurately, even when one of the arguments
is unreliable and deceptive. If this is helpful, we may be able to increase our
justified trust in language-model-based systems by asking them to produce these
arguments where needed. Previous research has shown that just a single turn of
arguments in this format is not helpful to humans. However, as debate settings
are characterized by a back-and-forth dialogue, we follow up on previous
results to test whether adding a second round of counter-arguments is helpful
to humans. We find that, regardless of whether they have access to arguments or
not, humans perform similarly on our task. These findings suggest that, in the
case of answering reading comprehension questions, debate is not a helpful
format.
Related papers
- Overview of PerpectiveArg2024: The First Shared Task on Perspective Argument Retrieval [56.66761232081188]
We present a novel dataset covering demographic and socio-cultural (socio) variables, such as age, gender, and political attitude, representing minority and majority groups in society.
We find substantial challenges in incorporating perspectivism, especially when aiming for personalization based solely on the text of arguments without explicitly providing socio profiles.
While we bootstrap perspective argument retrieval, further research is essential to optimize retrieval systems to facilitate personalization and reduce polarization.
arXiv Detail & Related papers (2024-07-29T03:14:57Z) - On scalable oversight with weak LLMs judging strong LLMs [67.8628575615614]
We study debate, where two AI's compete to convince a judge; consultancy, where a single AI tries to convince a judge that asks questions.
We use large language models (LLMs) as both AI agents and as stand-ins for human judges, taking the judge models to be weaker than agent models.
arXiv Detail & Related papers (2024-07-05T16:29:15Z) - Debate Helps Supervise Unreliable Experts [33.03555781137954]
We show that debate between two unreliable experts can help a non-expert judge more reliably identify the truth.
Comparing debate to a baseline we call consultancy, where a single expert argues for only one answer which is correct half of the time, we find that debate performs significantly better.
These results show that debate is a promising approach for supervising increasingly capable but potentially unreliable AI systems.
arXiv Detail & Related papers (2023-11-15T05:05:40Z) - Solving NLP Problems through Human-System Collaboration: A
Discussion-based Approach [98.13835740351932]
This research aims to create a dataset and computational framework for systems that discuss and refine their predictions through dialogue.
We show that the proposed system can have beneficial discussions with humans improving the accuracy by up to 25 points in the natural language inference task.
arXiv Detail & Related papers (2023-05-19T16:24:50Z) - Testing AI on language comprehension tasks reveals insensitivity to underlying meaning [3.335047764053173]
Large Language Models (LLMs) are recruited in applications that span from clinical assistance and legal support to question answering and education.
Yet, reverse-engineering is bound by Moravec's Paradox, according to which easy skills are hard.
We systematically assess 7 state-of-the-art models on a novel benchmark.
arXiv Detail & Related papers (2023-02-23T20:18:52Z) - Persua: A Visual Interactive System to Enhance the Persuasiveness of
Arguments in Online Discussion [52.49981085431061]
Enhancing people's ability to write persuasive arguments could contribute to the effectiveness and civility in online communication.
We derived four design goals for a tool that helps users improve the persuasiveness of arguments in online discussions.
Persua is an interactive visual system that provides example-based guidance on persuasive strategies to enhance the persuasiveness of arguments.
arXiv Detail & Related papers (2022-04-16T08:07:53Z) - Single-Turn Debate Does Not Help Humans Answer Hard
Reading-Comprehension Questions [29.932543276414602]
We build a dataset of single arguments for both a correct and incorrect answer option in a debate-style set-up.
We use long contexts -- humans familiar with the context write convincing explanations for pre-selected correct and incorrect answers.
We test if those explanations allow humans who have not read the full context to more accurately determine the correct answer.
arXiv Detail & Related papers (2022-04-11T15:56:34Z) - Prompting Contrastive Explanations for Commonsense Reasoning Tasks [74.7346558082693]
Large pretrained language models (PLMs) can achieve near-human performance on commonsense reasoning tasks.
We show how to use these same models to generate human-interpretable evidence.
arXiv Detail & Related papers (2021-06-12T17:06:13Z) - Extracting Implicitly Asserted Propositions in Argumentation [8.20413690846954]
We study methods for extracting propositions implicitly asserted in questions, reported speech, and imperatives in argumentation.
Our study may inform future research on argument mining and the semantics of these rhetorical devices in argumentation.
arXiv Detail & Related papers (2020-10-06T12:03:47Z) - Aspect-Controlled Neural Argument Generation [65.91772010586605]
We train a language model for argument generation that can be controlled on a fine-grained level to generate sentence-level arguments for a given topic, stance, and aspect.
Our evaluation shows that our generation model is able to generate high-quality, aspect-specific arguments.
These arguments can be used to improve the performance of stance detection models via data augmentation and to generate counter-arguments.
arXiv Detail & Related papers (2020-04-30T20:17:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.