Related papers: DS@GT at Touché: Large Language Models for Retrieval-Augmented Debate

DS@GT at Touché: Large Language Models for Retrieval-Augmented Debate

URL: http://arxiv.org/abs/2507.09090v1
Date: Sat, 12 Jul 2025 00:20:00 GMT
Title: DS@GT at Touché: Large Language Models for Retrieval-Augmented Debate
Authors: Anthony Miyaguchi, Conor Johnston, Aaryan Potdar,
Abstract summary: We deploy six leading publicly available models for the Retrieval-Augmented Debate and Evaluation.<n>The evaluation is performed by measuring four key metrics: Quality, Quantity, Manner, and Relation.<n>Although LLMs perform well in debates when given related arguments, they tend to be verbose in responses yet consistent in evaluation.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) demonstrate strong conversational abilities. In this Working Paper, we study them in the context of debating in two ways: their ability to perform in a structured debate along with a dataset of arguments to use and their ability to evaluate utterances throughout the debate. We deploy six leading publicly available models from three providers for the Retrieval-Augmented Debate and Evaluation. The evaluation is performed by measuring four key metrics: Quality, Quantity, Manner, and Relation. Throughout this task, we found that although LLMs perform well in debates when given related arguments, they tend to be verbose in responses yet consistent in evaluation. The accompanying source code for this paper is located at https://github.com/dsgt-arc/touche-2025-rad.

Related papers

DebateBench: A Challenging Long Context Reasoning Benchmark For Large Language Models [1.8197265299982013]
We introduce DebateBench, a novel dataset consisting of an extensive collection of transcripts and metadata from some of the world's most prestigious competitive debates.<n>The dataset consists of British Parliamentary debates from prestigious debating tournaments on diverse topics, annotated with detailed speech-level scores and house rankings sourced from official adjudication data.<n>We curate 256 speeches across 32 debates with each debate being over 1 hour long with each input being an average of 32,000 tokens.
arXiv Detail & Related papers (2025-02-10T09:23:03Z)
OpenDebateEvidence: A Massive-Scale Argument Mining and Summarization Dataset [10.385189302526246]
OpenDebateEvidence is a comprehensive dataset for argument mining and summarization sourced from the American Debate Competitive community. This dataset includes over 3.5 million documents with rich metadata, making it one of the most extensive collections of debate evidence.
arXiv Detail & Related papers (2024-06-20T18:22:59Z)
Debatrix: Multi-dimensional Debate Judge with Iterative Chronological Analysis Based on LLM [51.43102092480804]
Debatrix is an automated debate judge based on Large Language Models (LLMs) To align with real-world debate scenarios, we introduced the PanelBench benchmark, comparing our system's performance to actual debate outcomes. The findings indicate a notable enhancement over directly using LLMs for debate evaluation.
arXiv Detail & Related papers (2024-03-12T18:19:47Z)
Argue with Me Tersely: Towards Sentence-Level Counter-Argument Generation [62.069374456021016]
We present the ArgTersely benchmark for sentence-level counter-argument generation. We also propose Arg-LlaMA for generating high-quality counter-argument.
arXiv Detail & Related papers (2023-12-21T06:51:34Z)
Exploring the Potential of Large Language Models in Computational Argumentation [54.85665903448207]
Large language models (LLMs) have demonstrated impressive capabilities in understanding context and generating natural language. This work aims to embark on an assessment of LLMs, such as ChatGPT, Flan models, and LLaMA2 models, in both zero-shot and few-shot settings.
arXiv Detail & Related papers (2023-11-15T15:12:15Z)
DebateKG: Automatic Policy Debate Case Creation with Semantic Knowledge Graphs [0.0]
We show that effective debate cases can be constructed using constrained shortest path traversals on Argumentative Semantic Knowledge Graphs. We significantly improve upon DebateSum by introducing 53180 new examples. We create a unique method for evaluating which knowledge graphs are better in the context of producing policy debate cases.
arXiv Detail & Related papers (2023-07-09T04:19:19Z)
Don't Copy the Teacher: Data and Model Challenges in Embodied Dialogue [92.01165203498299]
Embodied dialogue instruction following requires an agent to complete a complex sequence of tasks from a natural language exchange. This paper argues that imitation learning (IL) and related low-level metrics are actually misleading and do not align with the goals of embodied dialogue research.
arXiv Detail & Related papers (2022-10-10T05:51:40Z)
ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads. We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z)
High Quality Real-Time Structured Debate Generation [0.0]
We define debate trees and paths for generating debates while enforcing a high level structure and grammar. We leverage a large corpus of tree-structured debates that have metadata associated with each argument. Our results demonstrate the ability to generate debates in real-time on complex topics at a quality that is close to humans.
arXiv Detail & Related papers (2020-12-01T01:39:38Z)
DebateSum: A large-scale argument mining and summarization dataset [0.0]
DebateSum consists of 187,386 unique pieces of evidence with corresponding argument and extractive summaries. We train several transformer summarization models to benchmark summarization performance on DebateSum. We present a search engine for this dataset which is utilized extensively by members of the National Speech and Debate Association.
arXiv Detail & Related papers (2020-11-14T10:06:57Z)
Aspect-Controlled Neural Argument Generation [65.91772010586605]
We train a language model for argument generation that can be controlled on a fine-grained level to generate sentence-level arguments for a given topic, stance, and aspect. Our evaluation shows that our generation model is able to generate high-quality, aspect-specific arguments. These arguments can be used to improve the performance of stance detection models via data augmentation and to generate counter-arguments.
arXiv Detail & Related papers (2020-04-30T20:17:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.