DESYR: Definition and Syntactic Representation Based Claim Detection on
the Web
- URL: http://arxiv.org/abs/2108.08759v1
- Date: Thu, 19 Aug 2021 16:00:13 GMT
- Title: DESYR: Definition and Syntactic Representation Based Claim Detection on
the Web
- Authors: Megha Sundriyal, Parantak Singh, Md Shad Akhtar, Shubhashis Sengupta,
Tanmoy Chakraborty
- Abstract summary: DESYR is a framework that intends on annulling the issues for informal web-based text.
It builds upon the state-of-the-art system across four benchmark claim datasets.
We make a 100-D pre-trained version of our Poincare-variant along with the source code.
- Score: 16.00615726292801
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The formulation of a claim rests at the core of argument mining. To demarcate
between a claim and a non-claim is arduous for both humans and machines, owing
to latent linguistic variance between the two and the inadequacy of extensive
definition-based formalization. Furthermore, the increase in the usage of
online social media has resulted in an explosion of unsolicited information on
the web presented as informal text. To account for the aforementioned, in this
paper, we proposed DESYR. It is a framework that intends on annulling the said
issues for informal web-based text by leveraging a combination of hierarchical
representation learning (dependency-inspired Poincare embedding),
definition-based alignment, and feature projection. We do away with fine-tuning
computer-heavy language models in favor of fabricating a more domain-centric
but lighter approach. Experimental results indicate that DESYR builds upon the
state-of-the-art system across four benchmark claim datasets, most of which
were constructed with informal texts. We see an increase of 3 claim-F1 points
on the LESA-Twitter dataset, an increase of 1 claim-F1 point and 9 macro-F1
points on the Online Comments(OC) dataset, an increase of 24 claim-F1 points
and 17 macro-F1 points on the Web Discourse(WD) dataset, and an increase of 8
claim-F1 points and 5 macro-F1 points on the Micro Texts(MT) dataset. We also
perform an extensive analysis of the results. We make a 100-D pre-trained
version of our Poincare-variant along with the source code.
Related papers
- Claim Extraction for Fact-Checking: Data, Models, and Automated Metrics [0.0]
We release the FEVERFact dataset, with 17K atomic factual claims extracted from 4K contextualised Wikipedia sentences.
For each metric, we implement a scale using a reduction to an already-explored NLP task.
We validate our metrics against human grading of generic claims, to see that the model ranking on $F_fact$, our hardest metric, did not change.
arXiv Detail & Related papers (2025-02-07T14:20:45Z) - Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation.
We introduce novel methodologies and datasets to overcome these challenges.
We propose MhBART, an encoder-decoder model designed to emulate human writing style.
We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z) - FarFetched: Entity-centric Reasoning and Claim Validation for the Greek Language based on Textually Represented Environments [0.3874856507026475]
We address the need for automated claim validation based on the aggregated evidence derived from multiple online news sources.
We introduce an entity-centric reasoning framework in which latent connections between events, actions, or statements are revealed.
Our approach tries to fill the gap in automated claim validation for less-resourced languages.
arXiv Detail & Related papers (2024-07-13T13:30:20Z) - SCITAB: A Challenging Benchmark for Compositional Reasoning and Claim
Verification on Scientific Tables [68.76415918462418]
We present SCITAB, a challenging evaluation dataset consisting of 1.2K expert-verified scientific claims.
Through extensive evaluations, we demonstrate that SCITAB poses a significant challenge to state-of-the-art models.
Our analysis uncovers several unique challenges posed by SCITAB, including table grounding, claim ambiguity, and compositional reasoning.
arXiv Detail & Related papers (2023-05-22T16:13:50Z) - WiCE: Real-World Entailment for Claims in Wikipedia [63.234352061821625]
We propose WiCE, a new fine-grained textual entailment dataset built on natural claim and evidence pairs extracted from Wikipedia.
In addition to standard claim-level entailment, WiCE provides entailment judgments over sub-sentence units of the claim.
We show that real claims in our dataset involve challenging verification and retrieval problems that existing models fail to address.
arXiv Detail & Related papers (2023-03-02T17:45:32Z) - Enriching Relation Extraction with OpenIE [70.52564277675056]
Relation extraction (RE) is a sub-discipline of information extraction (IE)
In this work, we explore how recent approaches for open information extraction (OpenIE) may help to improve the task of RE.
Our experiments over two annotated corpora, KnowledgeNet and FewRel, demonstrate the improved accuracy of our enriched models.
arXiv Detail & Related papers (2022-12-19T11:26:23Z) - Retrieval-based Disentangled Representation Learning with Natural
Language Supervision [61.75109410513864]
We present Vocabulary Disentangled Retrieval (VDR), a retrieval-based framework that harnesses natural language as proxies of the underlying data variation to drive disentangled representation learning.
Our approach employ a bi-encoder model to represent both data and natural language in a vocabulary space, enabling the model to distinguish intrinsic dimensions that capture characteristics within data through its natural language counterpart, thus disentanglement.
arXiv Detail & Related papers (2022-12-15T10:20:42Z) - End-to-End Multimodal Fact-Checking and Explanation Generation: A
Challenging Dataset and Models [0.0]
We propose end-to-end multimodal fact-checking and explanation generation.
The goal is to assess the truthfulness of a claim by retrieving relevant evidence and predicting a truthfulness label.
To support this research, we construct Mocheg, a large-scale dataset consisting of 15,601 claims.
arXiv Detail & Related papers (2022-05-25T04:36:46Z) - ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive
Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads.
We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z) - LESA: Linguistic Encapsulation and Semantic Amalgamation Based
Generalised Claim Detection from Online Content [15.814664354258184]
LESA aims at advancing headfirst into expunging the former issue by assembling a source-independent generalized model.
We resolve the latter issue by annotating a Twitter dataset which aims at providing a testing ground on a large unstructured dataset.
Experimental results show that LESA improves upon the state-of-the-art performance across six benchmark claim datasets.
arXiv Detail & Related papers (2021-01-28T09:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.