Corpus-Level Evaluation for Event QA: The IndiaPoliceEvents Corpus
Covering the 2002 Gujarat Violence
- URL: http://arxiv.org/abs/2105.12936v1
- Date: Thu, 27 May 2021 04:15:44 GMT
- Title: Corpus-Level Evaluation for Event QA: The IndiaPoliceEvents Corpus
Covering the 2002 Gujarat Violence
- Authors: Andrew Halterman, Katherine A. Keith, Sheikh Muhammad Sarwar, Brendan
O'Connor
- Abstract summary: We introduce the IndiaPoliceEvents corpus--all 21,391 sentences from 1,257 English-language Times of India articles about events in the state of Gujarat during March 2002.
Our trained annotators read and label every document for mentions of police activity events, allowing for unbiased recall evaluations.
- Score: 11.610715844912368
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automated event extraction in social science applications often requires
corpus-level evaluations: for example, aggregating text predictions across
metadata and unbiased estimates of recall. We combine corpus-level evaluation
requirements with a real-world, social science setting and introduce the
IndiaPoliceEvents corpus--all 21,391 sentences from 1,257 English-language
Times of India articles about events in the state of Gujarat during March 2002.
Our trained annotators read and label every document for mentions of police
activity events, allowing for unbiased recall evaluations. In contrast to other
datasets with structured event representations, we gather annotations by posing
natural questions, and evaluate off-the-shelf models for three different tasks:
sentence classification, document ranking, and temporal aggregation of target
events. We present baseline results from zero-shot BERT-based models fine-tuned
on natural language inference and passage retrieval tasks. Our novel
corpus-level evaluations and annotation approach can guide creation of similar
social-science-oriented resources in the future.
Related papers
- What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation [57.550045763103334]
evaluating a story can be more challenging than other generation evaluation tasks.
We first summarize existing storytelling tasks, including text-to-text, visual-to-text, and text-to-visual.
We propose a taxonomy to organize evaluation metrics that have been developed or can be adopted for story evaluation.
arXiv Detail & Related papers (2024-08-26T20:35:42Z) - An Evaluation Framework for Mapping News Headlines to Event Classes in a
Knowledge Graph [3.9742873618618275]
We present a methodology for creating a benchmark dataset of news headlines mapped to event classes in Wikidata.
We use the dataset to study two classes of unsupervised methods for this task.
We present the results of our evaluation, lessons learned, and directions for future work.
arXiv Detail & Related papers (2023-12-04T20:42:26Z) - Rethinking Evaluation Metrics of Open-Vocabulary Segmentaion [78.76867266561537]
The evaluation process still heavily relies on closed-set metrics without considering the similarity between predicted and ground truth categories.
To tackle this issue, we first survey eleven similarity measurements between two categorical words.
We designed novel evaluation metrics, namely Open mIoU, Open AP, and Open PQ, tailored for three open-vocabulary segmentation tasks.
arXiv Detail & Related papers (2023-11-06T18:59:01Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - An Evaluation Framework for Legal Document Summarization [1.9709122688953327]
A law practitioner has to go through numerous lengthy legal case proceedings for their practices of various categories, such as land dispute, corruption, etc.
It is important to summarize these documents, and ensure that summaries contain phrases with intent matching the category of the case.
We propose an automated intent-based summarization metric, which shows a better agreement with human evaluation as compared to other automated metrics like BLEU, ROUGE-L etc.
arXiv Detail & Related papers (2022-05-17T16:42:03Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z) - A Survey on Text Classification: From Shallow to Deep Learning [83.47804123133719]
The last decade has seen a surge of research in this area due to the unprecedented success of deep learning.
This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021.
We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification.
arXiv Detail & Related papers (2020-08-02T00:09:03Z) - Cross-context News Corpus for Protest Events related Knowledge Base
Construction [0.15393457051344295]
We describe a gold standard corpus of protest events that comprise of various local and international sources in English.
This corpus facilitates creating machine learning models that automatically classify news articles and extract protest event-related information.
arXiv Detail & Related papers (2020-08-01T22:20:48Z) - A computational model implementing subjectivity with the 'Room Theory'.
The case of detecting Emotion from Text [68.8204255655161]
This work introduces a new method to consider subjectivity and general context dependency in text analysis.
By using similarity measure between words, we are able to extract the relative relevance of the elements in the benchmark.
This method could be applied to all the cases where evaluating subjectivity is relevant to understand the relative value or meaning of a text.
arXiv Detail & Related papers (2020-05-12T21:26:04Z) - Seeing the Forest and the Trees: Detection and Cross-Document
Coreference Resolution of Militarized Interstate Disputes [3.8073142980733]
I provide a data set for evaluating methods to identify certain political events in text and to link related texts to one another based on shared events.
The data set, Headlines of War, is built on the Militarized Interstate Disputes data set and offers headlines classified by dispute status and headline pairs labeled with coreference indicators.
I introduce a model capable of accomplishing both tasks. The multi-task convolutional neural network is shown to be capable of recognizing events and event coreferences given the headlines' texts and publication dates.
arXiv Detail & Related papers (2020-05-06T17:20:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.