Guidelines and a Corpus for Extracting Biographical Events
- URL: http://arxiv.org/abs/2206.03547v1
- Date: Tue, 7 Jun 2022 19:36:18 GMT
- Title: Guidelines and a Corpus for Extracting Biographical Events
- Authors: Marco Antonio Stranisci, Enrico Mensa, Ousmane Diakite, Daniele
Radicioni, Rossana Damiano
- Abstract summary: Our work challenges the limitation by providing a set of guidelines for the semantic annotation of life events.
The guidelines are designed to be interoperable with existing ISO-standards for semantic annotation: ISO-TimeML (ISO-24617-1), and SemAF (ISO-24617-4)
- Score: 1.181206257787103
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Despite biographies are widely spread within the Semantic Web, resources and
approaches to automatically extract biographical events are limited. Such
limitation reduces the amount of structured, machine-readable biographical
information, especially about people belonging to underrepresented groups. Our
work challenges this limitation by providing a set of guidelines for the
semantic annotation of life events. The guidelines are designed to be
interoperable with existing ISO-standards for semantic annotation: ISO-TimeML
(ISO-24617-1), and SemAF (ISO-24617-4). Guidelines were tested through an
annotation task of Wikipedia biographies of underrepresented writers, namely
authors born in non-Western countries, migrants, or belonging to ethnic
minorities. 1,000 sentences were annotated by 4 annotators with an average
Inter-Annotator Agreement of 0.825. The resulting corpus was mapped on
OntoNotes. Such mapping allowed to to expand our corpus, showing that already
existing resources may be exploited for the biographical event extraction task.
Related papers
- FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction [85.26780391682894]
We propose Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction (FENICE)
FENICE leverages an NLI-based alignment between information in the source document and a set of atomic facts, referred to as claims, extracted from the summary.
Our metric sets a new state of the art on AGGREFACT, the de-facto benchmark for factuality evaluation.
arXiv Detail & Related papers (2024-03-04T17:57:18Z) - Evaluating the Factuality of Zero-shot Summarizers Across Varied Domains [60.5207173547769]
We evaluate zero-shot generated summaries across specialized domains including biomedical articles, and legal bills.
We acquire annotations from domain experts to identify inconsistencies in summaries and systematically categorize these errors.
We release all collected annotations to facilitate additional research toward measuring and realizing factually accurate summarization, beyond news articles.
arXiv Detail & Related papers (2024-02-05T20:51:11Z) - FRACAS: A FRench Annotated Corpus of Attribution relations in newS [0.0]
We present a manually annotated corpus of 1676 newswire texts in French for quotation extraction and source attribution.
We first describe the composition of our corpus and the choices that were made in selecting the data.
We then detail our inter-annotator agreement between the 8 annotators who worked on manual labelling.
arXiv Detail & Related papers (2023-09-19T13:19:54Z) - Wikibio: a Semantic Resource for the Intersectional Analysis of
Biographical Events [3.8455936323976694]
We present a new corpus annotated for biographical event detection.
The model was able to detect all mentions of the target-entity in a biography with an F-score of 0.808.
It was also used for performing an analysis of biases about women and non-Western people in Wikipedia biographies.
arXiv Detail & Related papers (2023-06-15T20:59:37Z) - GENTLE: A Genre-Diverse Multilayer Challenge Set for English NLP and
Linguistic Evaluation [15.886585212606787]
We present GENTLE, a new mixed-genre English challenge corpus totaling 17K tokens.
GENTLE is manually annotated for a variety of popular NLP tasks.
We evaluate state-of-the-art NLP systems on GENTLE and find severe degradation for at least some genres in their performance on all tasks.
arXiv Detail & Related papers (2023-06-03T00:20:15Z) - A Corpus for Sentence-level Subjectivity Detection on English News Articles [49.49218203204942]
We use our guidelines to collect NewsSD-ENG, a corpus of 638 objective and 411 subjective sentences extracted from English news articles on controversial topics.
Our corpus paves the way for subjectivity detection in English without relying on language-specific tools, such as lexicons or machine translation.
arXiv Detail & Related papers (2023-05-29T11:54:50Z) - Knowledge-Rich Self-Supervised Entity Linking [58.838404666183656]
Knowledge-RIch Self-Supervision ($tt KRISSBERT$) is a universal entity linker for four million UMLS entities.
Our approach subsumes zero-shot and few-shot methods, and can easily incorporate entity descriptions and gold mention labels if available.
Without using any labeled information, our method produces $tt KRISSBERT$, a universal entity linker for four million UMLS entities.
arXiv Detail & Related papers (2021-12-15T05:05:12Z) - Annotation Curricula to Implicitly Train Non-Expert Annotators [56.67768938052715]
voluntary studies often require annotators to familiarize themselves with the task, its annotation scheme, and the data domain.
This can be overwhelming in the beginning, mentally taxing, and induce errors into the resulting annotations.
We propose annotation curricula, a novel approach to implicitly train annotators.
arXiv Detail & Related papers (2021-06-04T09:48:28Z) - Long Document Summarization in a Low Resource Setting using Pretrained
Language Models [28.042826329840437]
We study a challenging low-resource setting of summarizing long legal briefs with an average source document length of 4268 words.
We use a modern pretrained abstractive summarizer BART, which only achieves 17.9 ROUGE-L as it struggles with long sentences.
On feeding the compressed documents to BART, we observe a 6.0 ROUGE-L improvement.
arXiv Detail & Related papers (2021-03-01T04:43:55Z) - Topic-Centric Unsupervised Multi-Document Summarization of Scientific
and News Articles [3.0504782036247438]
We propose a topic-centric unsupervised multi-document summarization framework to generate abstractive summaries.
The proposed algorithm generates an abstractive summary by developing salient language unit selection and text generation techniques.
Our approach matches the state-of-the-art when evaluated on automated extractive evaluation metrics and performs better for abstractive summarization on five human evaluation metrics.
arXiv Detail & Related papers (2020-11-03T04:04:21Z) - Constrained Abstractive Summarization: Preserving Factual Consistency
with Constrained Generation [93.87095877617968]
We propose Constrained Abstractive Summarization (CAS), a general setup that preserves the factual consistency of abstractive summarization.
We adopt lexically constrained decoding, a technique generally applicable to autoregressive generative models, to fulfill CAS.
We observe up to 13.8 ROUGE-2 gains when only one manual constraint is used in interactive summarization.
arXiv Detail & Related papers (2020-10-24T00:27:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.