A Novel Two-stage Framework for Extracting Opinionated Sentences from
News Articles
- URL: http://arxiv.org/abs/2101.09743v1
- Date: Sun, 24 Jan 2021 16:24:20 GMT
- Title: A Novel Two-stage Framework for Extracting Opinionated Sentences from
News Articles
- Authors: Rajkumar Pujari and Swara Desai and Niloy Ganguly and Pawan Goyal
- Abstract summary: This paper presents a novel two-stage framework to extract opinionated sentences from a given news article.
In the first stage, Naive Bayes classifier by utilizing the local features assigns a score to each sentence.
In the second stage, we use this prior within the HITS (Hyperlink-Induced Topic Search) schema to exploit the global structure of the article.
- Score: 24.528177249269582
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a novel two-stage framework to extract opinionated
sentences from a given news article. In the first stage, Naive Bayes classifier
by utilizing the local features assigns a score to each sentence - the score
signifies the probability of the sentence to be opinionated. In the second
stage, we use this prior within the HITS (Hyperlink-Induced Topic Search)
schema to exploit the global structure of the article and relation between the
sentences. In the HITS schema, the opinionated sentences are treated as Hubs
and the facts around these opinions are treated as the Authorities. The
algorithm is implemented and evaluated against a set of manually marked data.
We show that using HITS significantly improves the precision over the baseline
Naive Bayes classifier. We also argue that the proposed method actually
discovers the underlying structure of the article, thus extracting various
opinions, grouped with supporting facts as well as other supporting opinions
from the article.
Related papers
- Hierarchical Indexing for Retrieval-Augmented Opinion Summarization [60.5923941324953]
We propose a method for unsupervised abstractive opinion summarization that combines the attributability and scalability of extractive approaches with the coherence and fluency of Large Language Models (LLMs)
Our method, HIRO, learns an index structure that maps sentences to a path through a semantically organized discrete hierarchy.
At inference time, we populate the index and use it to identify and retrieve clusters of sentences containing popular opinions from input reviews.
arXiv Detail & Related papers (2024-03-01T10:38:07Z) - Incremental Extractive Opinion Summarization Using Cover Trees [81.59625423421355]
In online marketplaces user reviews accumulate over time, and opinion summaries need to be updated periodically.
In this work, we study the task of extractive opinion summarization in an incremental setting.
We present an efficient algorithm for accurately computing the CentroidRank summaries in an incremental setting.
arXiv Detail & Related papers (2024-01-16T02:00:17Z) - Content Significance Distribution of Sub-Text Blocks in Articles and Its Application to Article-Organization Assessment [3.2245324254437846]
We formulate the notion of content significance distribution (CSD) of sub-text blocks.
In particular, we leverage Hugging Face's SentenceTransformer to generate contextual sentence embeddings.
We show that the approximated CSD-1 is almost identical to the exact CSD-1.
arXiv Detail & Related papers (2023-11-03T02:43:51Z) - RankCSE: Unsupervised Sentence Representations Learning via Learning to
Rank [54.854714257687334]
We propose a novel approach, RankCSE, for unsupervised sentence representation learning.
It incorporates ranking consistency and ranking distillation with contrastive learning into a unified framework.
An extensive set of experiments are conducted on both semantic textual similarity (STS) and transfer (TR) tasks.
arXiv Detail & Related papers (2023-05-26T08:27:07Z) - PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and
Entailment Recognition [63.51569687229681]
We argue for the need to recognize the textual entailment relation of each proposition in a sentence individually.
We propose PropSegmEnt, a corpus of over 45K propositions annotated by expert human raters.
Our dataset structure resembles the tasks of (1) segmenting sentences within a document to the set of propositions, and (2) classifying the entailment relation of each proposition with respect to a different yet topically-aligned document.
arXiv Detail & Related papers (2022-12-21T04:03:33Z) - Assessing Effectiveness of Using Internal Signals for Check-Worthy Claim
Identification in Unlabeled Data for Automated Fact-Checking [6.193231258199234]
This paper explores methodology to identify check-worthy claim sentences from fake news articles.
We leverage two internal supervisory signals - headline and the abstractive summary - to rank the sentences.
We show that while the headline has more gisting similarity with how a fact-checking website writes a claim, the summary-based pipeline is the most promising for an end-to-end fact-checking system.
arXiv Detail & Related papers (2021-11-02T16:17:20Z) - Fine-Grained Opinion Summarization with Minimal Supervision [48.43506393052212]
FineSum aims to profile a target by extracting opinions from multiple documents.
FineSum automatically identifies opinion phrases from the raw corpus, classifies them into different aspects and sentiments, and constructs multiple fine-grained opinion clusters under each aspect/sentiment.
Both automatic evaluation on the benchmark and quantitative human evaluation validate the effectiveness of our approach.
arXiv Detail & Related papers (2021-10-17T15:16:34Z) - Text Summarization of Czech News Articles Using Named Entities [0.0]
We focus on the impact of named entities on the summarization of Czech news articles.
We propose a new metric ROUGE_NE that measures the overlap of named entities between the true and generated summaries.
We show that it is still challenging for summarization systems to reach a high score in it.
arXiv Detail & Related papers (2021-04-21T10:48:14Z) - ReSCo-CC: Unsupervised Identification of Key Disinformation Sentences [3.7405995078130148]
We propose a novel unsupervised task of identifying sentences containing key disinformation within a document that is known to be untrustworthy.
We design a three-phase statistical NLP solution for the task which starts with embedding sentences within a bespoke feature space designed for the task.
We show that our method is able to identify core disinformation effectively.
arXiv Detail & Related papers (2020-10-21T08:53:36Z) - Context-Based Quotation Recommendation [60.93257124507105]
We propose a novel context-aware quote recommendation system.
It generates a ranked list of quotable paragraphs and spans of tokens from a given source document.
We conduct experiments on a collection of speech transcripts and associated news articles.
arXiv Detail & Related papers (2020-05-17T17:49:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.