Large Language Models Are Better Logical Fallacy Reasoners with Counterargument, Explanation, and Goal-Aware Prompt Formulation
- URL: http://arxiv.org/abs/2503.23363v1
- Date: Sun, 30 Mar 2025 08:41:09 GMT
- Title: Large Language Models Are Better Logical Fallacy Reasoners with Counterargument, Explanation, and Goal-Aware Prompt Formulation
- Authors: Jiwon Jeong, Hyeju Jang, Hogun Park,
- Abstract summary: This study presents a novel and effective prompt formulation approach for logical fallacy detection.<n>Our method enriches input text incorporating implicit contextual information, which we query for validity within the context of the argument.<n>We evaluate our approach across multiple datasets from 5 domains, covering 29 distinct fallacy types.
- Score: 2.4073494101588273
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The advancement of Large Language Models (LLMs) has greatly improved our ability to process complex language. However, accurately detecting logical fallacies remains a significant challenge. This study presents a novel and effective prompt formulation approach for logical fallacy detection, applicable in both supervised (fine-tuned) and unsupervised (zero-shot) settings. Our method enriches input text incorporating implicit contextual information -- counterarguments, explanations, and goals -- which we query for validity within the context of the argument. We then rank these queries based on confidence scores to inform classification. We evaluate our approach across multiple datasets from 5 domains, covering 29 distinct fallacy types, using models from the GPT and LLaMA series. The results show substantial improvements over state-of-the-art models, with F1 score increases of up to 0.60 in zero-shot settings and up to 0.45 in fine-tuned models. Extensive analyses further illustrate why and how our method excels.
Related papers
- SPARC: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models [74.40683913645731]
Zero-shot multi-label recognition (MLR) with Vision-Language Models (VLMs) faces significant challenges without training data, model tuning, or architectural modifications.<n>Our work proposes a novel solution treating VLMs as black boxes, leveraging scores without training data or ground truth.<n>Analysis of these prompt scores reveals VLM biases and AND''/OR' signal ambiguities, notably that maximum scores are surprisingly suboptimal compared to second-highest scores.
arXiv Detail & Related papers (2025-02-24T07:15:05Z) - A Logical Fallacy-Informed Framework for Argument Generation [34.35377699079075]
We introduce FIPO, a fallacy-informed framework that steers Large Language Models toward logically sound arguments.<n>Our results on argumentation datasets show that our method reduces the fallacy errors by up to 17.5%.<n>Our code is available atlucamouchel.com/lucamouchel/Logical-Fallacies.
arXiv Detail & Related papers (2024-08-07T08:19:44Z) - Flee the Flaw: Annotating the Underlying Logic of Fallacious Arguments Through Templates and Slot-filling [15.339084849719223]
We introduce four sets of explainable templates for common informal logical fallacies.
We conduct an annotation study on top of 400 fallacious arguments taken from LOGIC dataset.
We discover that state-of-the-art language models struggle with detecting fallacy templates.
arXiv Detail & Related papers (2024-06-18T08:44:45Z) - Autoformalizing Natural Language to First-Order Logic: A Case Study in Logical Fallacy Detection [44.31755414036022]
We introduce Natural Language to First-Order Logic (NL2FOL), a framework to autoformalize natural language to FOL step by step using Large Language Models (LLMs)<n>Our approach addresses key challenges in this translation process, including the integration of implicit background knowledge.<n>Being neurosymbolic, our approach also provides interpretable insights into the reasoning process and demonstrates robustness without requiring model fine-tuning or labeled training data.
arXiv Detail & Related papers (2024-04-18T00:20:48Z) - LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models [63.14196038655506]
We introduce LogicAsker, a novel approach for evaluating and enhancing the logical reasoning capabilities of large language models (LLMs)
Our methodology reveals significant gaps in LLMs' learning of logical rules, with identified reasoning failures ranging from 29% to 90% across different models.
We leverage these findings to construct targeted demonstration examples and fine-tune data, notably enhancing logical reasoning in models like GPT-4o by up to 5%.
arXiv Detail & Related papers (2024-01-01T13:53:53Z) - Large Language Models are Few-Shot Training Example Generators: A Case Study in Fallacy Recognition [49.38757847011105]
computational fallacy recognition faces challenges due to diverse genres, domains, and types of fallacies found in datasets.
We aim to enhance existing models for fallacy recognition by incorporating additional context and by leveraging large language models to generate synthetic data.
Our evaluation results demonstrate consistent improvements across fallacy types, datasets, and generators.
arXiv Detail & Related papers (2023-11-16T04:17:47Z) - Fine-tuning Language Models for Factuality [96.5203774943198]
Large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines.
Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations'
In this work, we fine-tune language models to be more factual, without human labeling.
arXiv Detail & Related papers (2023-11-14T18:59:15Z) - A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning [73.77088902676306]
We take a closer look at the self-verification abilities of large language models (LLMs) in the context of logical reasoning.
Our main findings suggest that existing LLMs could struggle to identify fallacious reasoning steps accurately and may fall short of guaranteeing the validity of self-verification methods.
arXiv Detail & Related papers (2023-11-14T07:13:10Z) - DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller
Language Models [18.96271708412086]
Chain-of-Thought (CoT) prompting has proven to be effective in enhancing the reasoning capabilities of Large Language Models (LLMs) with at least 100 billion parameters.
We introduce Dialogue-guided Chain-of-Thought (DialCoT) which employs a dialogue format to generate intermediate reasoning steps, guiding the model toward the final answer.
arXiv Detail & Related papers (2023-10-08T08:52:13Z) - POUF: Prompt-oriented unsupervised fine-tuning for large pre-trained
models [62.23255433487586]
We propose an unsupervised fine-tuning framework to fine-tune the model or prompt on the unlabeled target data.
We demonstrate how to apply our method to both language-augmented vision and masked-language models by aligning the discrete distributions extracted from the prompts and target data.
arXiv Detail & Related papers (2023-04-29T22:05:22Z) - Case-Based Reasoning with Language Models for Classification of Logical
Fallacies [3.511369967593153]
We propose a Case-Based Reasoning method that classifies new cases of logical fallacy.
Our experiments indicate that Case-Based Reasoning improves the accuracy and generalizability of language models.
arXiv Detail & Related papers (2023-01-27T17:49:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.