KG20C & KG20C-QA: Scholarly Knowledge Graph Benchmarks for Link Prediction and Question Answering
- URL: http://arxiv.org/abs/2512.21799v2
- Date: Tue, 30 Dec 2025 07:09:41 GMT
- Title: KG20C & KG20C-QA: Scholarly Knowledge Graph Benchmarks for Link Prediction and Question Answering
- Authors: Hung-Nghiep Tran, Atsuhiro Takasu,
- Abstract summary: KG20C is a high-quality scholarly knowledge graph constructed from the Microsoft Academic Graph.<n> KG20C-QA is built upon KG20C to support QA tasks on scholarly data.
- Score: 3.8315541579168353
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present KG20C and KG20C-QA, two curated datasets for advancing question answering (QA) research on scholarly data. KG20C is a high-quality scholarly knowledge graph constructed from the Microsoft Academic Graph through targeted selection of venues, quality-based filtering, and schema definition. Although KG20C has been available online in non-peer-reviewed sources such as GitHub repository, this paper provides the first formal, peer-reviewed description of the dataset, including clear documentation of its construction and specifications. KG20C-QA is built upon KG20C to support QA tasks on scholarly data. We define a set of QA templates that convert graph triples into natural language question--answer pairs, producing a benchmark that can be used both with graph-based models such as knowledge graph embeddings and with text-based models such as large language models. We benchmark standard knowledge graph embedding methods on KG20C-QA, analyze performance across relation types, and provide reproducible evaluation protocols. By officially releasing these datasets with thorough documentation, we aim to contribute a reusable, extensible resource for the research community, enabling future work in QA, reasoning, and knowledge-driven applications in the scholarly domain. The full datasets will be released at https://github.com/tranhungnghiep/KG20C/ upon paper publication.
Related papers
- Can Knowledge-Graph-based Retrieval Augmented Generation Really Retrieve What You Need? [57.28763506780752]
GraphFlow is a framework that efficiently retrieves accurate and diverse knowledge required for real-world queries from text-rich KGs.<n>It outperforms strong KG-RAG baselines, including GPT-4o, by 10% on average in hit rate and recall.<n>It also shows strong generalization to unseen KGs, demonstrating its effectiveness and robustness.
arXiv Detail & Related papers (2025-10-18T17:06:49Z) - Structural Alignment in Link Prediction [0.0]
This thesis proposes an alternative perspective on the field's approach to link prediction and KG data modelling.<n>This work re-analyses KGs and state-of-the-art link predictors from a graph-structure-first perspective.
arXiv Detail & Related papers (2025-05-08T04:27:15Z) - PeerQA: A Scientific Question Answering Dataset from Peer Reviews [51.95579001315713]
We present PeerQA, a real-world, scientific, document-level Question Answering dataset.<n>The dataset contains 579 QA pairs from 208 academic articles, with a majority from ML and NLP.<n>We provide a detailed analysis of the collected dataset and conduct experiments establishing baseline systems for all three tasks.
arXiv Detail & Related papers (2025-02-19T12:24:46Z) - Ontology-grounded Automatic Knowledge Graph Construction by LLM under Wikidata schema [60.42231674887294]
We propose an ontology-grounded approach to Knowledge Graph (KG) construction using Large Language Models (LLMs) on a knowledge base.<n>We ground generation of KG with the authored ontology based on extracted relations to ensure consistency and interpretability.<n>Our work presents a promising direction for scalable KG construction pipeline with minimal human intervention, that yields high quality and human-interpretable KGs.
arXiv Detail & Related papers (2024-12-30T13:36:05Z) - Knowledge Graph for NLG in the context of conversational agents [0.0]
We provide a review of different architectures used for knowledge graph-to-text generation including: Graph Neural Networks, the Graph Transformer, and linearization with seq2seq models.
We aim to refine benchmark datasets of kg-to-text generation on PLMs and to explore the emotional and multilingual dimensions in our future work.
arXiv Detail & Related papers (2023-07-04T08:03:33Z) - KGxBoard: Explainable and Interactive Leaderboard for Evaluation of
Knowledge Graph Completion Models [76.01814380927507]
KGxBoard is an interactive framework for performing fine-grained evaluation on meaningful subsets of the data.
In our experiments, we highlight the findings with the use of KGxBoard, which would have been impossible to detect with standard averaged single-score metrics.
arXiv Detail & Related papers (2022-08-23T15:11:45Z) - Knowledge Graph Question Answering Datasets and Their Generalizability:
Are They Enough for Future Research? [0.7817685358710509]
We analyze 25 well-known KGQA datasets for 5 different Knowledge Graphs (KGs)
We show that according to this definition many existing and online available KGQA datasets are either not suited to train a generalizable KGQA system or that the datasets are based on discontinued and out-dated KGs.
We propose a mitigation method for re-splitting available KGQA datasets to enable their applicability to evaluate generalization, without any cost and manual effort.
arXiv Detail & Related papers (2022-05-13T12:01:15Z) - Knowledge Graph Question Answering Leaderboard: A Community Resource to
Prevent a Replication Crisis [61.740077541531726]
We provide a new central and open leaderboard for any KGQA benchmark dataset as a focal point for the community.
Our analysis highlights existing problems during the evaluation of KGQA systems.
arXiv Detail & Related papers (2022-01-20T13:46:01Z) - Toward Subgraph-Guided Knowledge Graph Question Generation with Graph
Neural Networks [53.58077686470096]
Knowledge graph (KG) question generation (QG) aims to generate natural language questions from KGs and target answers.
In this work, we focus on a more realistic setting where we aim to generate questions from a KG subgraph and target answers.
arXiv Detail & Related papers (2020-04-13T15:43:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.