Related papers: Joint Effects of Argumentation Theory, Audio Modality and Data Enrichment on LLM-Based Fallacy Classification

Joint Effects of Argumentation Theory, Audio Modality and Data Enrichment on LLM-Based Fallacy Classification

URL: http://arxiv.org/abs/2509.11127v1
Date: Sun, 14 Sep 2025 06:35:34 GMT
Title: Joint Effects of Argumentation Theory, Audio Modality and Data Enrichment on LLM-Based Fallacy Classification
Authors: Hongxu Zhou, Hylke Westerdijk, Khondoker Ittehadul Islam,
Abstract summary: This study investigates how context and emotional tone metadata influence large language model (LLM) reasoning and performance in fallacy classification tasks.<n>Using data from U.S. presidential debates, we classify six fallacy types through various prompting strategies applied to the Qwen-3 (8B) model.
Score: 0.038233569758620044
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: This study investigates how context and emotional tone metadata influence large language model (LLM) reasoning and performance in fallacy classification tasks, particularly within political debate settings. Using data from U.S. presidential debates, we classify six fallacy types through various prompting strategies applied to the Qwen-3 (8B) model. We introduce two theoretically grounded Chain-of-Thought frameworks: Pragma-Dialectics and the Periodic Table of Arguments, and evaluate their effectiveness against a baseline prompt under three input settings: text-only, text with context, and text with both context and audio-based emotional tone metadata. Results suggest that while theoretical prompting can improve interpretability and, in some cases, accuracy, the addition of context and especially emotional tone metadata often leads to lowered performance. Emotional tone metadata biases the model toward labeling statements as \textit{Appeal to Emotion}, worsening logical reasoning. Overall, basic prompts often outperformed enhanced ones, suggesting that attention dilution from added inputs may worsen rather than improve fallacy classification in LLMs.

Related papers

TAIGR: Towards Modeling Influencer Content on Social Media via Structured, Pragmatic Inference [19.35061674485291]
Claim-centric verification methods struggle to capture the pragmatic meaning of influencer discourse.<n>We propose a structured framework designed to analyze influencer discourse, which operates in three stages.<n>We show that accurate validation requires modeling the discourse's pragmatic and argumentative structure.
arXiv Detail & Related papers (2026-01-27T20:12:57Z)
A Generalizable Rhetorical Strategy Annotation Model Using LLM-based Debate Simulation and Labelling [35.2732875767252]
We propose a novel framework that leverages large language models (LLMs) to automatically generate and label synthetic debate data based on a four-part rhetorical typology (causal, empirical, emotional, moral)<n>Our model achieves high performance and strong generalization across topical domains.<n>We illustrate two applications with the fine-tuned model: (1) the improvement in persuasiveness prediction from incorporating rhetorical strategy labels, and (2) analyzing temporal and partisan shifts in rhetorical strategies in U.S. Presidential debates (1960-2020)
arXiv Detail & Related papers (2025-10-16T18:51:23Z)
Can LLMs Judge Debates? Evaluating Non-Linear Reasoning via Argumentation Theory Semantics [24.173784986846687]
We evaluate whether Large Language Models (LLMs) can approximate structured reasoning from Computational Argumentation Theory (CAT)<n>We use Quantitative Argumentation Debate (QuAD) semantics, which assigns acceptability scores to arguments based on their attack and support relations.
arXiv Detail & Related papers (2025-09-19T08:10:32Z)
SpeechR: A Benchmark for Speech Reasoning in Large Audio-Language Models [60.72029578488467]
SpeechR is a unified benchmark for evaluating reasoning over speech in large audio-language models.<n>It evaluates models along three key dimensions: factual retrieval, procedural inference, and normative judgment.<n> Evaluations on eleven state-of-the-art LALMs reveal that high transcription accuracy does not translate into strong reasoning capabilities.
arXiv Detail & Related papers (2025-08-04T03:28:04Z)
More or Less Wrong: A Benchmark for Directional Bias in LLM Comparative Reasoning [10.301985230669684]
We study the mechanisms by which semantic cues shape reasoning in large language models.<n>We introduce MathComp, a benchmark of 300 comparison scenarios.<n>We find that model errors frequently reflect linguistic steering, systematic shifts toward the comparative term present in the prompt.
arXiv Detail & Related papers (2025-06-04T13:15:01Z)
Uncovering Autoregressive LLM Knowledge of Thematic Fit in Event Representation [0.09558392439655014]
We assess whether pre-trained autoregressive LLMs possess consistent, expressible knowledge about thematic fit. We evaluate both closed and open state-of-the-art LLMs on several psycholinguistic datasets. Our results show that chain-of-thought reasoning is more effective on datasets with self-explanatory semantic role labels.
arXiv Detail & Related papers (2024-10-19T18:25:30Z)
Causal Micro-Narratives [62.47217054314046]
We present a novel approach to classify causal micro-narratives from text. These narratives are sentence-level explanations of the cause(s) and/or effect(s) of a target subject.
arXiv Detail & Related papers (2024-10-07T17:55:10Z)
Can Language Models Take A Hint? Prompting for Controllable Contextualized Commonsense Inference [12.941933077524919]
We introduce "hinting," a data augmentation technique that enhances contextualized commonsense inference. "Hinting" employs a prefix prompting strategy using both hard and soft prompts to guide the inference process. Our results show that "hinting" does not compromise the performance of contextual commonsense inference while offering improved controllability.
arXiv Detail & Related papers (2024-10-03T04:32:46Z)
Categorical Syllogisms Revisited: A Review of the Logical Reasoning Abilities of LLMs for Analyzing Categorical Syllogism [62.571419297164645]
This paper provides a systematic overview of prior works on the logical reasoning ability of large language models for analyzing categorical syllogisms.<n>We first investigate all the possible variations for the categorical syllogisms from a purely logical perspective.<n>We then examine the underlying configurations (i.e., mood and figure) tested by the existing datasets.
arXiv Detail & Related papers (2024-06-26T21:17:20Z)
Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue [71.15186328127409]
Paralinguistics-enhanced Generative Pretrained Transformer (ParalinGPT) Model takes the conversational context of text, speech embeddings, and paralinguistic attributes as input prompts within a serialized multitasking framework. We utilize the Switchboard-1 corpus, including its sentiment labels as the paralinguistic attribute, as our spoken dialogue dataset.
arXiv Detail & Related papers (2023-12-23T18:14:56Z)
LLMs and Finetuning: Benchmarking cross-domain performance for hate speech detection [9.166963162285064]
This study investigates the effectiveness and adaptability of pre-trained and fine-tuned Large Language Models (LLMs) in identifying hate speech.<n>LLMs offer a huge advantage over the state-of-the-art even without pretraining.
arXiv Detail & Related papers (2023-10-29T10:07:32Z)
How to Handle Different Types of Out-of-Distribution Scenarios in Computational Argumentation? A Comprehensive and Fine-Grained Field Study [59.13867562744973]
This work systematically assesses LMs' capabilities for out-of-distribution (OOD) scenarios. We find that the efficacy of such learning paradigms varies with the type of OOD. Specifically, while ICL excels for domain shifts, prompt-based fine-tuning surpasses for topic shifts.
arXiv Detail & Related papers (2023-09-15T11:15:47Z)
DiPlomat: A Dialogue Dataset for Situated Pragmatic Reasoning [89.92601337474954]
Pragmatic reasoning plays a pivotal role in deciphering implicit meanings that frequently arise in real-life conversations. We introduce a novel challenge, DiPlomat, aiming at benchmarking machines' capabilities on pragmatic reasoning and situated conversational understanding.
arXiv Detail & Related papers (2023-06-15T10:41:23Z)
APOLLO: A Simple Approach for Adaptive Pretraining of Language Models for Logical Reasoning [73.3035118224719]
We propose APOLLO, an adaptively pretrained language model that has improved logical reasoning abilities. APOLLO performs comparably on ReClor and outperforms baselines on LogiQA.
arXiv Detail & Related papers (2022-12-19T07:40:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.