Knowledge Graph Question Answering Leaderboard: A Community Resource to
Prevent a Replication Crisis
- URL: http://arxiv.org/abs/2201.08174v1
- Date: Thu, 20 Jan 2022 13:46:01 GMT
- Title: Knowledge Graph Question Answering Leaderboard: A Community Resource to
Prevent a Replication Crisis
- Authors: Aleksandr Perevalov, Xi Yan, Liubov Kovriguina, Longquan Jiang,
Andreas Both, Ricardo Usbeck
- Abstract summary: We provide a new central and open leaderboard for any KGQA benchmark dataset as a focal point for the community.
Our analysis highlights existing problems during the evaluation of KGQA systems.
- Score: 61.740077541531726
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data-driven systems need to be evaluated to establish trust in the scientific
approach and its applicability. In particular, this is true for Knowledge Graph
(KG) Question Answering (QA), where complex data structures are made accessible
via natural-language interfaces. Evaluating the capabilities of these systems
has been a driver for the community for more than ten years while establishing
different KGQA benchmark datasets. However, comparing different approaches is
cumbersome. The lack of existing and curated leaderboards leads to a missing
global view over the research field and could inject mistrust into the results.
In particular, the latest and most-used datasets in the KGQA community, LC-QuAD
and QALD, miss providing central and up-to-date points of trust. In this paper,
we survey and analyze a wide range of evaluation results with significant
coverage of 100 publications and 98 systems from the last decade. We provide a
new central and open leaderboard for any KGQA benchmark dataset as a focal
point for the community - https://kgqa.github.io/leaderboard. Our analysis
highlights existing problems during the evaluation of KGQA systems. Thus, we
will point to possible improvements for future evaluations.
Related papers
- Towards Robust Evaluation: A Comprehensive Taxonomy of Datasets and Metrics for Open Domain Question Answering in the Era of Large Language Models [0.0]
Open Domain Question Answering (ODQA) within natural language processing involves building systems that answer factual questions using large-scale knowledge corpora.
High-quality datasets are used to train models on realistic scenarios.
Standardized metrics facilitate comparisons between different ODQA systems.
arXiv Detail & Related papers (2024-06-19T05:43:02Z) - Question Answering Over Spatio-Temporal Knowledge Graph [13.422936134074629]
We present a dataset comprising 10,000 natural language questions for incorporatingtemporal knowledge graph question answering (STKGQA)
By extracting temporal and spatial information from a question, our QA model can better comprehend the question and retrieve accurate answers from the STKG.
arXiv Detail & Related papers (2024-02-18T10:44:48Z) - KGxBoard: Explainable and Interactive Leaderboard for Evaluation of
Knowledge Graph Completion Models [76.01814380927507]
KGxBoard is an interactive framework for performing fine-grained evaluation on meaningful subsets of the data.
In our experiments, we highlight the findings with the use of KGxBoard, which would have been impossible to detect with standard averaged single-score metrics.
arXiv Detail & Related papers (2022-08-23T15:11:45Z) - Knowledge Graph Question Answering Datasets and Their Generalizability:
Are They Enough for Future Research? [0.7817685358710509]
We analyze 25 well-known KGQA datasets for 5 different Knowledge Graphs (KGs)
We show that according to this definition many existing and online available KGQA datasets are either not suited to train a generalizable KGQA system or that the datasets are based on discontinued and out-dated KGs.
We propose a mitigation method for re-splitting available KGQA datasets to enable their applicability to evaluate generalization, without any cost and manual effort.
arXiv Detail & Related papers (2022-05-13T12:01:15Z) - Gait Recognition in the Wild: A Large-scale Benchmark and NAS-based
Baseline [95.88825497452716]
Gait benchmarks empower the research community to train and evaluate high-performance gait recognition systems.
GREW is the first large-scale dataset for gait recognition in the wild.
SPOSGait is the first NAS-based gait recognition model.
arXiv Detail & Related papers (2022-05-05T14:57:39Z) - HeteroQA: Learning towards Question-and-Answering through Multiple
Information Sources via Heterogeneous Graph Modeling [50.39787601462344]
Community Question Answering (CQA) is a well-defined task that can be used in many scenarios, such as E-Commerce and online user community for special interests.
Most of the CQA methods only incorporate articles or Wikipedia to extract knowledge and answer the user's question.
We propose a question-aware heterogeneous graph transformer to incorporate the multiple information sources (MIS) in the user community to automatically generate the answer.
arXiv Detail & Related papers (2021-12-27T10:16:43Z) - Question Answering Over Temporal Knowledge Graphs [20.479222151497495]
Temporal Knowledge Graphs (Temporal KGs) extend regular Knowledge Graphs by providing temporal scopes (start and end times) on each edge in the KG.
While Question Answering over KG (KGQA) has received some attention from the research community, QA over Temporal KGs (Temporal KGQA) is a relatively unexplored area.
We present CRONQUESTIONS, the largest known Temporal KGQA dataset, clearly stratified into buckets of structural complexity.
arXiv Detail & Related papers (2021-06-03T00:45:07Z) - QD-GCN: Query-Driven Graph Convolutional Networks for Attributed
Community Search [54.42038098426504]
QD-GCN is an end-to-end framework that unifies the community structure as well as node attributes to solve the ACS problem.
We show that QD-GCN outperforms existing attributed community search algorithms in terms of both efficiency and effectiveness.
arXiv Detail & Related papers (2021-04-08T07:52:48Z) - Benchmarking Graph Neural Networks [75.42159546060509]
Graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs.
For any successful field to become mainstream and reliable, benchmarks must be developed to quantify progress.
GitHub repository has reached 1,800 stars and 339 forks, which demonstrates the utility of the proposed open-source framework.
arXiv Detail & Related papers (2020-03-02T15:58:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.