LESA: Linguistic Encapsulation and Semantic Amalgamation Based
Generalised Claim Detection from Online Content
- URL: http://arxiv.org/abs/2101.11891v1
- Date: Thu, 28 Jan 2021 09:51:30 GMT
- Title: LESA: Linguistic Encapsulation and Semantic Amalgamation Based
Generalised Claim Detection from Online Content
- Authors: Shreya Gupta, Parantak Singh, Megha Sundriyal, Md Shad Akhtar, Tanmoy
Chakraborty
- Abstract summary: LESA aims at advancing headfirst into expunging the former issue by assembling a source-independent generalized model.
We resolve the latter issue by annotating a Twitter dataset which aims at providing a testing ground on a large unstructured dataset.
Experimental results show that LESA improves upon the state-of-the-art performance across six benchmark claim datasets.
- Score: 15.814664354258184
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The conceptualization of a claim lies at the core of argument mining. The
segregation of claims is complex, owing to the divergence in textual syntax and
context across different distributions. Another pressing issue is the
unavailability of labeled unstructured text for experimentation. In this paper,
we propose LESA, a framework which aims at advancing headfirst into expunging
the former issue by assembling a source-independent generalized model that
captures syntactic features through part-of-speech and dependency embeddings,
as well as contextual features through a fine-tuned language model. We resolve
the latter issue by annotating a Twitter dataset which aims at providing a
testing ground on a large unstructured dataset. Experimental results show that
LESA improves upon the state-of-the-art performance across six benchmark claim
datasets by an average of 3 claim-F1 points for in-domain experiments and by 2
claim-F1 points for general-domain experiments. On our dataset too, LESA
outperforms existing baselines by 1 claim-F1 point on the in-domain experiments
and 2 claim-F1 points on the general-domain experiments. We also release
comprehensive data annotation guidelines compiled during the annotation phase
(which was missing in the current literature).
Related papers
- On Context Utilization in Summarization with Large Language Models [83.84459732796302]
Large language models (LLMs) excel in abstractive summarization tasks, delivering fluent and pertinent summaries.
Recent advancements have extended their capabilities to handle long-input contexts, exceeding 100k tokens.
We conduct the first comprehensive study on context utilization and position bias in summarization.
arXiv Detail & Related papers (2023-10-16T16:45:12Z) - Inducing Causal Structure for Abstractive Text Summarization [76.1000380429553]
We introduce a Structural Causal Model (SCM) to induce the underlying causal structure of the summarization data.
We propose a Causality Inspired Sequence-to-Sequence model (CI-Seq2Seq) to learn the causal representations that can mimic the causal factors.
Experimental results on two widely used text summarization datasets demonstrate the advantages of our approach.
arXiv Detail & Related papers (2023-08-24T16:06:36Z) - WiCE: Real-World Entailment for Claims in Wikipedia [63.234352061821625]
We propose WiCE, a new fine-grained textual entailment dataset built on natural claim and evidence pairs extracted from Wikipedia.
In addition to standard claim-level entailment, WiCE provides entailment judgments over sub-sentence units of the claim.
We show that real claims in our dataset involve challenging verification and retrieval problems that existing models fail to address.
arXiv Detail & Related papers (2023-03-02T17:45:32Z) - Full-Text Argumentation Mining on Scientific Publications [3.8754200816873787]
We introduce a sequential pipeline model combining ADUR and ARE for full-text SAM.
We provide a first analysis of the performance of pretrained language models (PLMs) on both subtasks.
Our detailed error analysis reveals that non-contiguous ADUs as well as the interpretation of discourse connectors pose major challenges.
arXiv Detail & Related papers (2022-10-24T10:05:30Z) - R$^2$F: A General Retrieval, Reading and Fusion Framework for
Document-level Natural Language Inference [29.520857954199904]
Document-level natural language inference (DOCNLI) is a new challenging task in natural language processing.
We establish a general solution, named Retrieval, Reading and Fusion (R2F) framework, and a new setting.
Our experimental results show that R2F framework can obtain state-of-the-art performance and is robust for diverse evidence retrieval methods.
arXiv Detail & Related papers (2022-10-22T02:02:35Z) - Bridging the Data Gap between Training and Inference for Unsupervised
Neural Machine Translation [49.916963624249355]
A UNMT model is trained on the pseudo parallel data with translated source, and natural source sentences in inference.
The source discrepancy between training and inference hinders the translation performance of UNMT models.
We propose an online self-training approach, which simultaneously uses the pseudo parallel data natural source, translated target to mimic the inference scenario.
arXiv Detail & Related papers (2022-03-16T04:50:27Z) - DESYR: Definition and Syntactic Representation Based Claim Detection on
the Web [16.00615726292801]
DESYR is a framework that intends on annulling the issues for informal web-based text.
It builds upon the state-of-the-art system across four benchmark claim datasets.
We make a 100-D pre-trained version of our Poincare-variant along with the source code.
arXiv Detail & Related papers (2021-08-19T16:00:13Z) - WikiAsp: A Dataset for Multi-domain Aspect-based Summarization [69.13865812754058]
We propose WikiAsp, a large-scale dataset for multi-domain aspect-based summarization.
Specifically, we build the dataset using Wikipedia articles from 20 different domains, using the section titles and boundaries of each article as a proxy for aspect annotation.
Results highlight key challenges that existing summarization models face in this setting, such as proper pronoun handling of quoted sources and consistent explanation of time-sensitive events.
arXiv Detail & Related papers (2020-11-16T10:02:52Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - Contextualized Embeddings in Named-Entity Recognition: An Empirical
Study on Generalization [14.47381093162237]
Contextualized embeddings use unsupervised language model pretraining to compute word representations depending on their context.
Standard English benchmarks overestimate the importance of lexical over contextual features because of an unrealistic lexical overlap between train and test mentions.
We show that they are particularly beneficial for unseen mentions detection, especially out-of-domain.
arXiv Detail & Related papers (2020-01-22T15:15:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.