ReSCo-CC: Unsupervised Identification of Key Disinformation Sentences
- URL: http://arxiv.org/abs/2010.10836v1
- Date: Wed, 21 Oct 2020 08:53:36 GMT
- Title: ReSCo-CC: Unsupervised Identification of Key Disinformation Sentences
- Authors: Soumya Suvra Ghosal, Deepak P, Anna Jurek-Loughrey
- Abstract summary: We propose a novel unsupervised task of identifying sentences containing key disinformation within a document that is known to be untrustworthy.
We design a three-phase statistical NLP solution for the task which starts with embedding sentences within a bespoke feature space designed for the task.
We show that our method is able to identify core disinformation effectively.
- Score: 3.7405995078130148
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Disinformation is often presented in long textual articles, especially when
it relates to domains such as health, often seen in relation to COVID-19. These
articles are typically observed to have a number of trustworthy sentences among
which core disinformation sentences are scattered. In this paper, we propose a
novel unsupervised task of identifying sentences containing key disinformation
within a document that is known to be untrustworthy. We design a three-phase
statistical NLP solution for the task which starts with embedding sentences
within a bespoke feature space designed for the task. Sentences represented
using those features are then clustered, following which the key sentences are
identified through proximity scoring. We also curate a new dataset with
sentence level disinformation scorings to aid evaluation for this task; the
dataset is being made publicly available to facilitate further research. Based
on a comprehensive empirical evaluation against techniques from related tasks
such as claim detection and summarization, as well as against simplified
variants of our proposed approach, we illustrate that our method is able to
identify core disinformation effectively.
Related papers
- Attributable and Scalable Opinion Summarization [79.87892048285819]
We generate abstractive summaries by decoding frequent encodings, and extractive summaries by selecting the sentences assigned to the same frequent encodings.
Our method is attributable, because the model identifies sentences used to generate the summary as part of the summarization process.
It scales easily to many hundreds of input reviews, because aggregation is performed in the latent space rather than over long sequences of tokens.
arXiv Detail & Related papers (2023-05-19T11:30:37Z) - Revisiting text decomposition methods for NLI-based factuality scoring
of summaries [9.044665059626958]
We show that fine-grained decomposition is not always a winning strategy for factuality scoring.
We also show that small changes to previously proposed entailment-based scoring methods can result in better performance.
arXiv Detail & Related papers (2022-11-30T09:54:37Z) - Distant finetuning with discourse relations for stance classification [55.131676584455306]
We propose a new method to extract data with silver labels from raw text to finetune a model for stance classification.
We also propose a 3-stage training framework where the noisy level in the data used for finetuning decreases over different stages.
Our approach ranks 1st among 26 competing teams in the stance classification track of the NLPCC 2021 shared task Argumentative Text Understanding for AI Debater.
arXiv Detail & Related papers (2022-04-27T04:24:35Z) - Span Classification with Structured Information for Disfluency Detection
in Spoken Utterances [47.05113261111054]
We propose a novel architecture for detecting disfluencies in transcripts from spoken utterances.
Our proposed model achieves state-of-the-art results on the widely used English Switchboard for disfluency detection.
arXiv Detail & Related papers (2022-03-30T03:22:29Z) - DEIM: An effective deep encoding and interaction model for sentence
matching [0.0]
We propose a sentence matching method based on deep encoding and interaction to extract deep semantic information.
In the encoder layer,we refer to the information of another sentence in the process of encoding a single sentence, and later use a algorithm to fuse the information.
In the interaction layer, we use a bidirectional attention mechanism and a self-attention mechanism to obtain deep semantic information.
arXiv Detail & Related papers (2022-03-20T07:59:42Z) - Learning to Detect Instance-level Salient Objects Using Complementary
Image Labels [55.049347205603304]
We present the first weakly-supervised approach to the salient instance detection problem.
We propose a novel weakly-supervised network with three branches: a Saliency Detection Branch leveraging class consistency information to locate candidate objects; a Boundary Detection Branch exploiting class discrepancy information to delineate object boundaries; and a Centroid Detection Branch using subitizing information to detect salient instance centroids.
arXiv Detail & Related papers (2021-11-19T10:15:22Z) - Unsupervised Extractive Summarization using Pointwise Mutual Information [5.544401446569243]
We propose new metrics of relevance and redundancy using pointwise mutual information (PMI) between sentences.
We show that our method outperforms similarity-based methods on datasets in a range of domains including news, medical journal articles, and personal anecdotes.
arXiv Detail & Related papers (2021-02-11T21:05:50Z) - Unsupervised Summarization for Chat Logs with Topic-Oriented Ranking and
Context-Aware Auto-Encoders [59.038157066874255]
We propose a novel framework called RankAE to perform chat summarization without employing manually labeled data.
RankAE consists of a topic-oriented ranking strategy that selects topic utterances according to centrality and diversity simultaneously.
A denoising auto-encoder is designed to generate succinct but context-informative summaries based on the selected utterances.
arXiv Detail & Related papers (2020-12-14T07:31:17Z) - Weakly-supervised Salient Instance Detection [65.0408760733005]
We present the first weakly-supervised approach to the salient instance detection problem.
We propose a novel weakly-supervised network with three branches: a Saliency Detection Branch leveraging class consistency information to locate candidate objects; a Boundary Detection Branch exploiting class discrepancy information to delineate object boundaries; and a Centroid Detection Branch using subitizing information to detect salient instance centroids.
arXiv Detail & Related papers (2020-09-29T09:47:23Z) - Salience Estimation with Multi-Attention Learning for Abstractive Text
Summarization [86.45110800123216]
In the task of text summarization, salience estimation for words, phrases or sentences is a critical component.
We propose a Multi-Attention Learning framework which contains two new attention learning components for salience estimation.
arXiv Detail & Related papers (2020-04-07T02:38:56Z) - Hybrid Attention-Based Transformer Block Model for Distant Supervision
Relation Extraction [20.644215991166902]
We propose a new framework using hybrid attention-based Transformer block with multi-instance learning to perform the DSRE task.
The proposed approach can outperform the state-of-the-art algorithms on the evaluation dataset.
arXiv Detail & Related papers (2020-03-10T13:05:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.