Related papers: Corpus-Level Evaluation for Event QA: The IndiaPoliceEvents Corpus Covering the 2002 Gujarat Violence

Corpus-Level Evaluation for Event QA: The IndiaPoliceEvents Corpus Covering the 2002 Gujarat Violence

URL: http://arxiv.org/abs/2105.12936v1
Date: Thu, 27 May 2021 04:15:44 GMT
Title: Corpus-Level Evaluation for Event QA: The IndiaPoliceEvents Corpus Covering the 2002 Gujarat Violence
Authors: Andrew Halterman, Katherine A. Keith, Sheikh Muhammad Sarwar, Brendan O'Connor
Abstract summary: We introduce the IndiaPoliceEvents corpus--all 21,391 sentences from 1,257 English-language Times of India articles about events in the state of Gujarat during March 2002. Our trained annotators read and label every document for mentions of police activity events, allowing for unbiased recall evaluations.
Score: 11.610715844912368
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automated event extraction in social science applications often requires corpus-level evaluations: for example, aggregating text predictions across metadata and unbiased estimates of recall. We combine corpus-level evaluation requirements with a real-world, social science setting and introduce the IndiaPoliceEvents corpus--all 21,391 sentences from 1,257 English-language Times of India articles about events in the state of Gujarat during March 2002. Our trained annotators read and label every document for mentions of police activity events, allowing for unbiased recall evaluations. In contrast to other datasets with structured event representations, we gather annotations by posing natural questions, and evaluate off-the-shelf models for three different tasks: sentence classification, document ranking, and temporal aggregation of target events. We present baseline results from zero-shot BERT-based models fine-tuned on natural language inference and passage retrieval tasks. Our novel corpus-level evaluations and annotation approach can guide creation of similar social-science-oriented resources in the future.

Related papers

Evaluating D-MERIT of Partial-annotation on Information Retrieval [77.44452769932676]
Retrieval models are often evaluated on partially-annotated datasets. We show that using partially-annotated datasets in evaluation can paint a distorted picture.
arXiv Detail & Related papers (2024-06-23T08:24:08Z)
An Evaluation Framework for Mapping News Headlines to Event Classes in a Knowledge Graph [3.9742873618618275]
We present a methodology for creating a benchmark dataset of news headlines mapped to event classes in Wikidata. We use the dataset to study two classes of unsupervised methods for this task. We present the results of our evaluation, lessons learned, and directions for future work.
arXiv Detail & Related papers (2023-12-04T20:42:26Z)
Rethinking Evaluation Metrics of Open-Vocabulary Segmentaion [78.76867266561537]
The evaluation process still heavily relies on closed-set metrics without considering the similarity between predicted and ground truth categories. To tackle this issue, we first survey eleven similarity measurements between two categorical words. We designed novel evaluation metrics, namely Open mIoU, Open AP, and Open PQ, tailored for three open-vocabulary segmentation tasks.
arXiv Detail & Related papers (2023-11-06T18:59:01Z)
Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks. Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena. For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z)
An Evaluation Framework for Legal Document Summarization [1.9709122688953327]
A law practitioner has to go through numerous lengthy legal case proceedings for their practices of various categories, such as land dispute, corruption, etc. It is important to summarize these documents, and ensure that summaries contain phrases with intent matching the category of the case. We propose an automated intent-based summarization metric, which shows a better agreement with human evaluation as compared to other automated metrics like BLEU, ROUGE-L etc.
arXiv Detail & Related papers (2022-05-17T16:42:03Z)
Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language. We generate abstractive summaries of narrated instructional videos across a wide variety of topics. We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z)
A Survey on Text Classification: From Shallow to Deep Learning [83.47804123133719]
The last decade has seen a surge of research in this area due to the unprecedented success of deep learning. This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021. We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification.
arXiv Detail & Related papers (2020-08-02T00:09:03Z)
Cross-context News Corpus for Protest Events related Knowledge Base Construction [0.15393457051344295]
We describe a gold standard corpus of protest events that comprise of various local and international sources in English. This corpus facilitates creating machine learning models that automatically classify news articles and extract protest event-related information.
arXiv Detail & Related papers (2020-08-01T22:20:48Z)
A computational model implementing subjectivity with the 'Room Theory'. The case of detecting Emotion from Text [68.8204255655161]
This work introduces a new method to consider subjectivity and general context dependency in text analysis. By using similarity measure between words, we are able to extract the relative relevance of the elements in the benchmark. This method could be applied to all the cases where evaluating subjectivity is relevant to understand the relative value or meaning of a text.
arXiv Detail & Related papers (2020-05-12T21:26:04Z)
Seeing the Forest and the Trees: Detection and Cross-Document Coreference Resolution of Militarized Interstate Disputes [3.8073142980733]
I provide a data set for evaluating methods to identify certain political events in text and to link related texts to one another based on shared events. The data set, Headlines of War, is built on the Militarized Interstate Disputes data set and offers headlines classified by dispute status and headline pairs labeled with coreference indicators. I introduce a model capable of accomplishing both tasks. The multi-task convolutional neural network is shown to be capable of recognizing events and event coreferences given the headlines' texts and publication dates.
arXiv Detail & Related papers (2020-05-06T17:20:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.