Cross-context News Corpus for Protest Events related Knowledge Base
Construction
- URL: http://arxiv.org/abs/2008.00351v1
- Date: Sat, 1 Aug 2020 22:20:48 GMT
- Title: Cross-context News Corpus for Protest Events related Knowledge Base
Construction
- Authors: Ali H\"urriyeto\u{g}lu, Erdem Y\"or\"uk, Deniz Y\"uret, Osman Mutlu,
\c{C}a\u{g}r{\i} Yoltar, F{\i}rat Duru\c{s}an, Burak G\"urel
- Abstract summary: We describe a gold standard corpus of protest events that comprise of various local and international sources in English.
This corpus facilitates creating machine learning models that automatically classify news articles and extract protest event-related information.
- Score: 0.15393457051344295
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We describe a gold standard corpus of protest events that comprise of various
local and international sources from various countries in English. The corpus
contains document, sentence, and token level annotations. This corpus
facilitates creating machine learning models that automatically classify news
articles and extract protest event-related information, constructing knowledge
bases which enable comparative social and political science studies. For each
news source, the annotation starts on random samples of news articles and
continues with samples that are drawn using active learning. Each batch of
samples was annotated by two social and political scientists, adjudicated by an
annotation supervisor, and was improved by identifying annotation errors
semi-automatically. We found that the corpus has the variety and quality to
develop and benchmark text classification and event extraction systems in a
cross-context setting, which contributes to the generalizability and robustness
of automated text processing systems. This corpus and the reported results will
set the currently lacking common ground in automated protest event collection
studies.
Related papers
- A Corpus for Sentence-level Subjectivity Detection on English News Articles [49.49218203204942]
We use our guidelines to collect NewsSD-ENG, a corpus of 638 objective and 411 subjective sentences extracted from English news articles on controversial topics.
Our corpus paves the way for subjectivity detection in English without relying on language-specific tools, such as lexicons or machine translation.
arXiv Detail & Related papers (2023-05-29T11:54:50Z) - Verifying the Robustness of Automatic Credibility Assessment [79.08422736721764]
Text classification methods have been widely investigated as a way to detect content of low credibility.
In some cases insignificant changes in input text can mislead the models.
We introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - EDSA-Ensemble: an Event Detection Sentiment Analysis Ensemble
Architecture [63.85863519876587]
Using Sentiment Analysis to understand the polarity of each message belonging to an event, as well as the entire event, can help to better understand the general and individual feelings of significant trends and the dynamics on online social networks.
We propose a new ensemble architecture, EDSA-Ensemble, that uses Event Detection and Sentiment Analysis to improve the detection of the polarity for current events from Social Media.
arXiv Detail & Related papers (2023-01-30T11:56:08Z) - O-Dang! The Ontology of Dangerous Speech Messages [53.15616413153125]
We present O-Dang!: The Ontology of Dangerous Speech Messages, a systematic and interoperable Knowledge Graph (KG)
O-Dang! is designed to gather and organize Italian datasets into a structured KG, according to the principles shared within the Linguistic Linked Open Data community.
It provides a model for encoding both gold standard and single-annotator labels in the KG.
arXiv Detail & Related papers (2022-07-13T11:50:05Z) - CrudeOilNews: An Annotated Crude Oil News Corpus for Event Extraction [0.665264113799989]
CrudeOilNews is a corpus of English Crude Oil news for event extraction.
It is the first of its kind for Commodity News and serve to contribute towards resource building for economic and financial text mining.
arXiv Detail & Related papers (2022-04-08T06:51:35Z) - MIND - Mainstream and Independent News Documents Corpus [0.7347989843033033]
This paper characterizes MIND, a new Portuguese corpus comprised of different types of articles collected from online mainstream and alternative media sources.
The articles in the corpus are organized into five collections: facts, opinions, entertainment, satires, and conspiracy theories.
arXiv Detail & Related papers (2021-08-13T14:00:12Z) - Corpus-Level Evaluation for Event QA: The IndiaPoliceEvents Corpus
Covering the 2002 Gujarat Violence [11.610715844912368]
We introduce the IndiaPoliceEvents corpus--all 21,391 sentences from 1,257 English-language Times of India articles about events in the state of Gujarat during March 2002.
Our trained annotators read and label every document for mentions of police activity events, allowing for unbiased recall evaluations.
arXiv Detail & Related papers (2021-05-27T04:15:44Z) - Global Attention for Name Tagging [56.62059996864408]
We present a new framework to improve name tagging by utilizing local, document-level, and corpus-level contextual information.
We propose a model that learns to incorporate document-level and corpus-level contextual information alongside local contextual information via global attentions.
Experiments on benchmark datasets show the effectiveness of our approach.
arXiv Detail & Related papers (2020-10-19T07:27:15Z) - AMALGUM -- A Free, Balanced, Multilayer English Web Corpus [14.073494095236027]
We present a genre-balanced English web corpus totaling 4M tokens.
By tapping open online data sources the corpus is meant to offer a more sizable alternative to smaller manually created annotated data sets.
arXiv Detail & Related papers (2020-06-18T17:05:45Z) - Seeing the Forest and the Trees: Detection and Cross-Document
Coreference Resolution of Militarized Interstate Disputes [3.8073142980733]
I provide a data set for evaluating methods to identify certain political events in text and to link related texts to one another based on shared events.
The data set, Headlines of War, is built on the Militarized Interstate Disputes data set and offers headlines classified by dispute status and headline pairs labeled with coreference indicators.
I introduce a model capable of accomplishing both tasks. The multi-task convolutional neural network is shown to be capable of recognizing events and event coreferences given the headlines' texts and publication dates.
arXiv Detail & Related papers (2020-05-06T17:20:14Z) - Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer [64.22926988297685]
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP)
In this paper, we explore the landscape of introducing transfer learning techniques for NLP by a unified framework that converts all text-based language problems into a text-to-text format.
arXiv Detail & Related papers (2019-10-23T17:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.