tieval: An Evaluation Framework for Temporal Information Extraction
Systems
- URL: http://arxiv.org/abs/2301.04643v3
- Date: Fri, 24 Nov 2023 16:13:18 GMT
- Title: tieval: An Evaluation Framework for Temporal Information Extraction
Systems
- Authors: Hugo Sousa, Al\'ipio Jorge, Ricardo Campos
- Abstract summary: Temporal information extraction has attracted a great deal of interest over the last two decades.
Having access to a large volume of corpora makes it difficult when it comes to benchmark TIE systems.
tieval is a Python library that provides a concise interface for importing different corpora and facilitates system evaluation.
- Score: 2.3035364984111495
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Temporal information extraction (TIE) has attracted a great deal of interest
over the last two decades, leading to the development of a significant number
of datasets. Despite its benefits, having access to a large volume of corpora
makes it difficult when it comes to benchmark TIE systems. On the one hand,
different datasets have different annotation schemes, thus hindering the
comparison between competitors across different corpora. On the other hand, the
fact that each corpus is commonly disseminated in a different format requires a
considerable engineering effort for a researcher/practitioner to develop
parsers for all of them. This constraint forces researchers to select a limited
amount of datasets to evaluate their systems which consequently limits the
comparability of the systems. Yet another obstacle that hinders the
comparability of the TIE systems is the evaluation metric employed. While most
research works adopt traditional metrics such as precision, recall, and $F_1$,
a few others prefer temporal awareness -- a metric tailored to be more
comprehensive on the evaluation of temporal systems. Although the reason for
the absence of temporal awareness in the evaluation of most systems is not
clear, one of the factors that certainly weights this decision is the necessity
to implement the temporal closure algorithm in order to compute temporal
awareness, which is not straightforward to implement neither is currently
easily available. All in all, these problems have limited the fair comparison
between approaches and consequently, the development of temporal extraction
systems. To mitigate these problems, we have developed tieval, a Python library
that provides a concise interface for importing different corpora and
facilitates system evaluation. In this paper, we present the first public
release of tieval and highlight its most relevant features.
Related papers
- TSI-Bench: Benchmarking Time Series Imputation [52.27004336123575]
TSI-Bench is a comprehensive benchmark suite for time series imputation utilizing deep learning techniques.
The TSI-Bench pipeline standardizes experimental settings to enable fair evaluation of imputation algorithms.
TSI-Bench innovatively provides a systematic paradigm to tailor time series forecasting algorithms for imputation purposes.
arXiv Detail & Related papers (2024-06-18T16:07:33Z) - A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods.
The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics.
We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z) - Deconstructing Self-Supervised Monocular Reconstruction: The Design
Decisions that Matter [63.5550818034739]
This paper presents a framework to evaluate state-of-the-art contributions to self-supervised monocular depth estimation.
It includes pretraining, backbone, architectural design choices and loss functions.
We re-implement, validate and re-evaluate 16 state-of-the-art contributions and introduce a new dataset.
arXiv Detail & Related papers (2022-08-02T14:38:53Z) - A novel evaluation methodology for supervised Feature Ranking algorithms [0.0]
This paper proposes a new evaluation methodology for Feature Rankers.
By making use of synthetic datasets, feature importance scores can be known beforehand, allowing more systematic evaluation.
To facilitate large-scale experimentation using the new methodology, a benchmarking framework was built in Python, called fseval.
arXiv Detail & Related papers (2022-07-09T12:00:36Z) - Are We There Yet? A Decision Framework for Replacing Term Based
Retrieval with Dense Retrieval Systems [35.77217529138364]
Several dense retrieval (DR) models have demonstrated competitive performance to term-based retrieval.
DR projects queries and documents into a dense vector space and retrieves results via (approximate) nearest neighbor search.
It is impossible to predict whether DR will become ubiquitous in the future, but one way this is possible is through repeated applications of decision processes.
arXiv Detail & Related papers (2022-06-26T23:16:05Z) - SAITS: Self-Attention-based Imputation for Time Series [6.321652307514677]
SAITS is a novel method based on the self-attention mechanism for missing value imputation in time series.
It learns missing values from a weighted combination of two diagonally-masked self-attention blocks.
Tests show SAITS outperforms state-of-the-art methods on the time-series imputation task efficiently.
arXiv Detail & Related papers (2022-02-17T08:40:42Z) - What are the best systems? New perspectives on NLP Benchmarking [10.27421161397197]
We propose a new procedure to rank systems based on their performance across different tasks.
Motivated by the social choice theory, the final system ordering is obtained through aggregating the rankings induced by each task.
We show that our method yields different conclusions on state-of-the-art systems than the mean-aggregation procedure.
arXiv Detail & Related papers (2022-02-08T11:44:20Z) - Benchmarking Quality-Dependent and Cost-Sensitive Score-Level Multimodal
Biometric Fusion Algorithms [58.156733807470395]
This paper reports a benchmarking study carried out within the framework of the BioSecure DS2 (Access Control) evaluation campaign.
The campaign targeted the application of physical access control in a medium-size establishment with some 500 persons.
To the best of our knowledge, this is the first attempt to benchmark quality-based multimodal fusion algorithms.
arXiv Detail & Related papers (2021-11-17T13:39:48Z) - Generalizing Cross-Document Event Coreference Resolution Across Multiple
Corpora [63.429307282665704]
Cross-document event coreference resolution (CDCR) is an NLP task in which mentions of events need to be identified and clustered throughout a collection of documents.
CDCR aims to benefit downstream multi-document applications, but improvements from applying CDCR have not been shown yet.
We make the observation that every CDCR system to date was developed, trained, and tested only on a single respective corpus.
arXiv Detail & Related papers (2020-11-24T17:45:03Z) - Measuring the Complexity of Domains Used to Evaluate AI Systems [0.48951183832371004]
We propose a theory for measuring the complexity between varied domains.
An application of this measure is then demonstrated to show its effectiveness as a tool in varied situations.
We propose the future use of such a complexity metric for use in computing an AI system's intelligence.
arXiv Detail & Related papers (2020-09-18T21:53:07Z) - Bias in Multimodal AI: Testbed for Fair Automatic Recruitment [73.85525896663371]
We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data.
We train automatic recruitment algorithms using a set of multimodal synthetic profiles consciously scored with gender and racial biases.
Our methodology and results show how to generate fairer AI-based tools in general, and in particular fairer automated recruitment systems.
arXiv Detail & Related papers (2020-04-15T15:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.