CKBP v2: An Expert-Annotated Evaluation Set for Commonsense Knowledge
Base Population
- URL: http://arxiv.org/abs/2304.10392v1
- Date: Thu, 20 Apr 2023 15:27:29 GMT
- Title: CKBP v2: An Expert-Annotated Evaluation Set for Commonsense Knowledge
Base Population
- Authors: Tianqing Fang, Quyet V. Do, Sehyun Choi, Weiqi Wang, Yangqiu Song
- Abstract summary: We introduce CKBP v2, a new high-quality CSKB Population benchmark.
We conduct experiments comparing state-of-the-art methods for CSKB Population on the new evaluation set.
- Score: 27.48660712102029
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Populating Commonsense Knowledge Bases (CSKB) is an important yet hard task
in NLP, as it tackles knowledge from external sources with unseen events and
entities. Fang et al. (2021a) proposed a CSKB Population benchmark with an
evaluation set CKBP v1. However, CKBP v1 adopts crowdsourced annotations that
suffer from a substantial fraction of incorrect answers, and the evaluation set
is not well-aligned with the external knowledge source as a result of random
sampling. In this paper, we introduce CKBP v2, a new high-quality CSKB
Population benchmark, which addresses the two mentioned problems by using
experts instead of crowd-sourced annotation and by adding diversified
adversarial samples to make the evaluation set more representative. We conduct
extensive experiments comparing state-of-the-art methods for CSKB Population on
the new evaluation set for future research comparisons. Empirical results show
that the population task is still challenging, even for large language models
(LLM) such as ChatGPT. Codes and data are available at
https://github.com/HKUST-KnowComp/CSKB-Population.
Related papers
- KBAlign: Efficient Self Adaptation on Specific Knowledge Bases [73.34893326181046]
We present KBAlign, a self-supervised framework that enhances RAG systems through efficient model adaptation.<n>Our key insight is to leverage the model's intrinsic capabilities for knowledge alignment through two innovative mechanisms.<n> Experiments demonstrate that KBAlign can achieve 90% of the performance gain obtained through GPT-4-supervised adaptation.
arXiv Detail & Related papers (2024-11-22T08:21:03Z) - Linguistic Fuzzy Information Evolution with Random Leader Election Mechanism for Decision-Making Systems [58.67035332062508]
Linguistic fuzzy information evolution is crucial in understanding information exchange among agents.
Different agent weights may lead to different convergence results in the classic DeGroot model.
This paper proposes three new models of linguistic fuzzy information dynamics.
arXiv Detail & Related papers (2024-10-19T18:15:24Z) - A Learn-Then-Reason Model Towards Generalization in Knowledge Base Question Answering [17.281005999581865]
Large-scale knowledge bases (KBs) like Freebase and Wikidata house millions of structured knowledge.
Knowledge Base Question Answering (KBQA) provides a user-friendly way to access these valuable KBs via asking natural language questions.
This paper develops KBLLaMA, which follows a learn-then-reason framework to inject new KB knowledge into a large language model for flexible end-to-end KBQA.
arXiv Detail & Related papers (2024-06-20T22:22:41Z) - CAR: Conceptualization-Augmented Reasoner for Zero-Shot Commonsense
Question Answering [56.592385613002584]
We propose Conceptualization-Augmented Reasoner (CAR) to tackle the task of zero-shot commonsense question answering.
CAR abstracts a commonsense knowledge triple to many higher-level instances, which increases the coverage of CommonSense Knowledge Bases.
CAR more robustly generalizes to answering questions about zero-shot commonsense scenarios than existing methods.
arXiv Detail & Related papers (2023-05-24T08:21:31Z) - GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models [60.48306899271866]
We present a new framework, called GREAT Score, for global robustness evaluation of adversarial perturbation using generative models.
We show high correlation and significantly reduced cost of GREAT Score when compared to the attack-based model ranking on RobustBench.
GREAT Score can be used for remote auditing of privacy-sensitive black-box models.
arXiv Detail & Related papers (2023-04-19T14:58:27Z) - PseudoReasoner: Leveraging Pseudo Labels for Commonsense Knowledge Base
Population [40.526736652672916]
We propose PseudoReasoner, a semi-supervised learning framework for CSKB population.
It uses a teacher model pre-trained on CSKBs to provide pseudo labels on the unlabeled candidate dataset for a student model to learn from.
The framework can improve the backbone model KG-BERT by 3.3 points on the overall performance and especially, 5.3 points on the out-of-domain performance.
arXiv Detail & Related papers (2022-10-14T17:37:30Z) - Reinforcement Learning with Heterogeneous Data: Estimation and Inference [84.72174994749305]
We introduce the K-Heterogeneous Markov Decision Process (K-Hetero MDP) to address sequential decision problems with population heterogeneity.
We propose the Auto-Clustered Policy Evaluation (ACPE) for estimating the value of a given policy, and the Auto-Clustered Policy Iteration (ACPI) for estimating the optimal policy in a given policy class.
We present simulations to support our theoretical findings, and we conduct an empirical study on the standard MIMIC-III dataset.
arXiv Detail & Related papers (2022-01-31T20:58:47Z) - Benchmarking Commonsense Knowledge Base Population with an Effective
Evaluation Dataset [37.02104430195374]
Reasoning over commonsense knowledge bases (CSKB) whose elements are in the form of free-text is an important yet hard task in NLP.
We benchmark the CSKB population task with a new large-scale dataset.
We also propose a novel inductive commonsense reasoning model that reasons over graphs.
arXiv Detail & Related papers (2021-09-16T02:50:01Z) - Beyond I.I.D.: Three Levels of Generalization for Question Answering on
Knowledge Bases [63.43418760818188]
We release a new large-scale, high-quality dataset with 64,331 questions, GrailQA.
We propose a novel BERT-based KBQA model.
The combination of our dataset and model enables us to thoroughly examine and demonstrate, for the first time, the key role of pre-trained contextual embeddings like BERT in the generalization of KBQA.
arXiv Detail & Related papers (2020-11-16T06:36:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.