Prove Your Point!: Bringing Proof-Enhancement Principles to Argumentative Essay Generation
- URL: http://arxiv.org/abs/2410.22642v1
- Date: Wed, 30 Oct 2024 02:13:39 GMT
- Title: Prove Your Point!: Bringing Proof-Enhancement Principles to Argumentative Essay Generation
- Authors: Ruiyu Xiao, Lei Wu, Yuhang Gou, Weinan Zhang, Ting Liu,
- Abstract summary: argumentative essay generation (AEG) aims to generate complete texts on specific controversial topics or debates.
We present a unified two-stage framework: Proof-Enhancement and Self-Enhancement.
PESA generates argumentative essays with better logical validity and persuasiveness than strong baseline models.
- Score: 27.117415957353245
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Argumentative essay generation (AEG) aims to generate complete texts on specific controversial topics or debates. Although current AEG methods can generate individual opinions, they often overlook the high-level connections between these opinions. This often leads to the generated results being mired in logical confusion, unable to proof their own arguments effectively. The generated essay may present evidence that contradicts the claims or they may fail to assemble the claims into logical flow. In this paper, we present a unified two-stage framework: Proof-Enhancement and Self-Annotation (PESA) for AEG with a focus on logical enhancement. Specifically, we first construct pseudo-labels for logical information,claims and grounds, using a large language model. We then propose a tree planning approach that introduces proof principles and ensures logical consistency. Extensive experimental results show that, benefiting from proof principle guidance, PESA generates argumentative essays with better logical validity and persuasiveness than strong baseline models.
Related papers
- ART: Adaptive Reasoning Trees for Explainable Claim Verification [11.001890567834094]
ART (Adaptive Reasoning Trees) is a hierarchical method for claim verification.<n>An argument's strength is determined bottom-up via a pairwise tournament of its children.<n>Our findings show that ART's structured reasoning outperforms strong baselines.
arXiv Detail & Related papers (2026-01-09T01:01:55Z) - ARCHE: A Novel Task to Evaluate LLMs on Latent Reasoning Chain Extraction [70.53044880892196]
We introduce a novel task named Latent Reasoning Chain Extraction (ARCHE), in which models must decompose complex reasoning arguments into combinations of standard reasoning paradigms in the form of a Reasoning Logic Tree (RLT)<n>To facilitate this task, we release ARCHE Bench, a new benchmark derived from 70 Nature Communications articles, including more than 1,900 references and 38,000 viewpoints.<n> Evaluations on 10 leading LLMs on ARCHE Bench reveal that models exhibit a trade-off between REA and EC, and none are yet able to extract a complete and standard reasoning chain.
arXiv Detail & Related papers (2025-11-16T07:37:09Z) - IntelliProof: An Argumentation Network-based Conversational Helper for Organized Reflection [2.7353636376883563]
We present IntelliProof, an interactive system for analyzing argumentative essays through LLMs.<n>Unlike existing automated essay scoring systems, IntelliProof emphasizes the user experience.<n>IntelliProof provides a set of tools for a better understanding of an argumentative essay and its corresponding graph in natural language.
arXiv Detail & Related papers (2025-11-06T16:43:37Z) - Are Language Models Efficient Reasoners? A Perspective from Logic Programming [109.47572890883248]
Modern language models (LMs) exhibit strong deductive reasoning capabilities, yet standard evaluations emphasize correctness while overlooking a key aspect of human-like reasoning: efficiency.<n>We propose a framework for assessing LM reasoning efficiency through the lens of logic programming.
arXiv Detail & Related papers (2025-10-29T15:30:31Z) - Reasoning is about giving reasons [55.56111618153049]
We show that we can identify and extract the logical structure of natural language arguments in three popular reasoning datasets with high accuracies.<n>Our approach supports all forms of reasoning that depend on the logical structure of the natural language argument.
arXiv Detail & Related papers (2025-08-20T07:26:53Z) - CLATTER: Comprehensive Entailment Reasoning for Hallucination Detection [60.98964268961243]
We propose that guiding models to perform a systematic and comprehensive reasoning process allows models to execute much finer-grained and accurate entailment decisions.<n>We define a 3-step reasoning process, consisting of (i) claim decomposition, (ii) sub-claim attribution and entailment classification, and (iii) aggregated classification, showing that such guided reasoning indeed yields improved hallucination detection.
arXiv Detail & Related papers (2025-06-05T17:02:52Z) - Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification [56.218970738892764]
Chain-of-Thought prompting has become the de facto method to elicit reasoning capabilities from large language models (LLMs)<n>To mitigate hallucinations in CoT that are notoriously difficult to detect, current methods operate as opaque boxes and do not provide checkable evidence for their judgments, possibly limiting their effectiveness.<n>We propose a retrospective, step-aware formal verification framework $Safe$. Rather than assigning arbitrary scores, we strive to articulate mathematical claims in formal mathematical language Lean 4 at each reasoning step and provide formal proofs to identify hallucinations.
arXiv Detail & Related papers (2025-06-05T03:16:08Z) - CHECKWHY: Causal Fact Verification via Argument Structure [19.347690600431463]
CheckWhy is a dataset tailored to a novel causal fact verification task.
CheckWhy consists of over 19K "why" claim-evidence-argument structure triplets with supports, refutes, and not enough info labels.
arXiv Detail & Related papers (2024-08-20T15:03:35Z) - Lean-STaR: Learning to Interleave Thinking and Proving [53.923617816215774]
We present Lean-STaR, a framework for training language models to produce informal thoughts prior to each step of a proof.
Lean-STaR achieves state-of-the-art results on the miniF2F-test benchmark within the Lean theorem proving environment.
arXiv Detail & Related papers (2024-07-14T01:43:07Z) - LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models [52.03659714625452]
Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks.
But, can they really "reason" over the natural language?
This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied.
arXiv Detail & Related papers (2024-04-23T21:08:49Z) - Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and Improving LLMs [87.34281749422756]
Large language models (LLMs) have achieved impressive human-like performance across various reasoning tasks.
However, their mastery of underlying inferential rules still falls short of human capabilities.
We propose a logic scaffolding inferential rule generation framework, to construct an inferential rule base, ULogic.
arXiv Detail & Related papers (2024-02-18T03:38:51Z) - CASA: Causality-driven Argument Sufficiency Assessment [79.13496878681309]
We propose CASA, a zero-shot causality-driven argument sufficiency assessment framework.
PS measures how likely introducing the premise event would lead to the conclusion when both the premise and conclusion events are absent.
Experiments on two logical fallacy detection datasets demonstrate that CASA accurately identifies insufficient arguments.
arXiv Detail & Related papers (2024-01-10T16:21:18Z) - A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning [73.77088902676306]
We take a closer look at the self-verification abilities of large language models (LLMs) in the context of logical reasoning.
Our main findings suggest that existing LLMs could struggle to identify fallacious reasoning steps accurately and may fall short of guaranteeing the validity of self-verification methods.
arXiv Detail & Related papers (2023-11-14T07:13:10Z) - Predicting the Quality of Revisions in Argumentative Writing [2.0572032297930503]
Chain-of-Thought prompts facilitate ChatGPT-generated ACs for AR quality predictions.
Experiments on two corpora, our annotated elementary essays and existing college essays benchmark, demonstrate the superiority of the proposed ACs over baselines.
arXiv Detail & Related papers (2023-06-01T13:39:33Z) - AQE: Argument Quadruplet Extraction via a Quad-Tagging Augmented
Generative Approach [40.510976649949576]
We propose a challenging argument quadruplet extraction task (AQE)
AQE can provide an all-in-one extraction of four argumentative components, i.e., claims, evidence, evidence types, and stances.
We propose a novel quad-tagging augmented generative approach, which leverages a quadruplet tagging module to augment the training of the generative framework.
arXiv Detail & Related papers (2023-05-31T14:35:53Z) - Conclusion-based Counter-Argument Generation [26.540485804067536]
In real-world debates, the most common way to counter an argument is to reason against its main point, that is, its conclusion.
We propose a multitask approach that jointly learns to generate both the conclusion and the counter of an input argument.
arXiv Detail & Related papers (2023-01-24T10:49:01Z) - Generating Natural Language Proofs with Verifier-Guided Search [74.9614610172561]
We present a novel stepwise method NLProofS (Natural Language Proof Search)
NLProofS learns to generate relevant steps conditioning on the hypothesis.
It achieves state-of-the-art performance on EntailmentBank and RuleTaker.
arXiv Detail & Related papers (2022-05-25T02:22:30Z) - ProoFVer: Natural Logic Theorem Proving for Fact Verification [24.61301908217728]
We propose ProoFVer, a proof system for fact verification using natural logic.
The generation of proofs makes ProoFVer an explainable system.
We find that humans correctly simulate ProoFVer's decisions more often using the proofs.
arXiv Detail & Related papers (2021-08-25T17:23:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.