Quantum Criticism: A Tagged News Corpus Analysed for Sentiment and Named
Entities
- URL: http://arxiv.org/abs/2006.05267v1
- Date: Fri, 5 Jun 2020 17:59:12 GMT
- Title: Quantum Criticism: A Tagged News Corpus Analysed for Sentiment and Named
Entities
- Authors: Ashwini Badgujar, Sheng Chen, Andrew Wang, Kai Yu, Paul Intrevado,
David Guy Brizan
- Abstract summary: We continuously collect data from the RSS feeds of traditional news sources.
We perform sentiment analysis of each news article at the document, paragraph and sentence level.
We show how the data in this corpus could be used to identify bias in news reporting.
- Score: 18.458831729497224
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this research, we continuously collect data from the RSS feeds of
traditional news sources. We apply several pre-trained implementations of named
entity recognition (NER) tools, quantifying the success of each implementation.
We also perform sentiment analysis of each news article at the document,
paragraph and sentence level, with the goal of creating a corpus of tagged news
articles that is made available to the public through a web interface. Finally,
we show how the data in this corpus could be used to identify bias in news
reporting.
Related papers
- FineFake: A Knowledge-Enriched Dataset for Fine-Grained Multi-Domain Fake News Detection [54.37159298632628]
FineFake is a multi-domain knowledge-enhanced benchmark for fake news detection.
FineFake encompasses 16,909 data samples spanning six semantic topics and eight platforms.
The entire FineFake project is publicly accessible as an open-source repository.
arXiv Detail & Related papers (2024-03-30T14:39:09Z) - SciNews: From Scholarly Complexities to Public Narratives -- A Dataset for Scientific News Report Generation [20.994565065595232]
We present a new corpus to facilitate the automated generation of scientific news reports.
Our dataset comprises academic publications and their corresponding scientific news reports across nine disciplines.
We benchmark our dataset employing state-of-the-art text generation models.
arXiv Detail & Related papers (2024-03-26T14:54:48Z) - Detection and Discovery of Misinformation Sources using Attributed Webgraphs [3.659498819753633]
We introduce a novel attributed webgraph dataset with labeled news domains and their connections to outlinking and backlinking domains.
We demonstrate the success of graph neural networks in detecting news site reliability using these attributed webgraphs.
We also introduce and evaluate a novel graph-based algorithm for discovering previously unknown misinformation news sources.
arXiv Detail & Related papers (2024-01-04T17:47:36Z) - Code Book for the Annotation of Diverse Cross-Document Coreference of
Entities in News Articles [0.0]
It includes a precise description of how to set up Inception, a respective annotation tool, how to annotate entities in news articles, connect them with diverse coreferential relations, and link them across documents to Wikidata's global knowledge graph.
Our main contribution lies in providing a methodology for creating a diverse cross-document coreference corpus which can be applied to the analysis of media bias by word-choice and labelling.
arXiv Detail & Related papers (2023-10-18T15:53:45Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - Assessing the quality of sources in Wikidata across languages: a hybrid
approach [64.05097584373979]
We run a series of microtasks experiments to evaluate a large corpus of references, sampled from Wikidata triples with labels in several languages.
We use a consolidated, curated version of the crowdsourced assessments to train several machine learning models to scale up the analysis to the whole of Wikidata.
The findings help us ascertain the quality of references in Wikidata, and identify common challenges in defining and capturing the quality of user-generated multilingual structured data on the web.
arXiv Detail & Related papers (2021-09-20T10:06:46Z) - MIND - Mainstream and Independent News Documents Corpus [0.7347989843033033]
This paper characterizes MIND, a new Portuguese corpus comprised of different types of articles collected from online mainstream and alternative media sources.
The articles in the corpus are organized into five collections: facts, opinions, entertainment, satires, and conspiracy theories.
arXiv Detail & Related papers (2021-08-13T14:00:12Z) - Named Entity Recognition for Social Media Texts with Semantic
Augmentation [70.44281443975554]
Existing approaches for named entity recognition suffer from data sparsity problems when conducted on short and informal texts.
We propose a neural-based approach to NER for social media texts where both local (from running text) and augmented semantics are taken into account.
arXiv Detail & Related papers (2020-10-29T10:06:46Z) - Global Attention for Name Tagging [56.62059996864408]
We present a new framework to improve name tagging by utilizing local, document-level, and corpus-level contextual information.
We propose a model that learns to incorporate document-level and corpus-level contextual information alongside local contextual information via global attentions.
Experiments on benchmark datasets show the effectiveness of our approach.
arXiv Detail & Related papers (2020-10-19T07:27:15Z) - Multimodal Analytics for Real-world News using Measures of Cross-modal
Entity Consistency [8.401772200450417]
Multimodal information, e.g., enriching text with photos, is typically used to convey the news more effectively or to attract attention.
We introduce a novel task of cross-modal consistency verification in real-world news and present a multimodal approach to quantify the entity coherence between image and text.
arXiv Detail & Related papers (2020-03-23T17:49:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.