ESBM: An Entity Summarization BenchMark
- URL: http://arxiv.org/abs/2003.03734v1
- Date: Sun, 8 Mar 2020 07:12:20 GMT
- Title: ESBM: An Entity Summarization BenchMark
- Authors: Qingxia Liu, Gong Cheng, Kalpa Gunaratna, Yuzhong Qu
- Abstract summary: We create an Entity Summarization BenchMark (ESBM) which overcomes the limitations of existing benchmarks and meets standard desiderata for a benchmark.
Considering all of these systems are unsupervised, we also implement and evaluate a supervised learning based system for reference.
- Score: 20.293900908253544
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Entity summarization is the problem of computing an optimal compact summary
for an entity by selecting a size-constrained subset of triples from RDF data.
Entity summarization supports a multiplicity of applications and has led to
fruitful research. However, there is a lack of evaluation efforts that cover
the broad spectrum of existing systems. One reason is a lack of benchmarks for
evaluation. Some benchmarks are no longer available, while others are small and
have limitations. In this paper, we create an Entity Summarization BenchMark
(ESBM) which overcomes the limitations of existing benchmarks and meets
standard desiderata for a benchmark. Using this largest available benchmark for
evaluating general-purpose entity summarizers, we perform the most extensive
experiment to date where 9~existing systems are compared. Considering that all
of these systems are unsupervised, we also implement and evaluate a supervised
learning based system for reference.
Related papers
- OCTrack: Benchmarking the Open-Corpus Multi-Object Tracking [63.53176412315835]
We study a novel yet practical problem of open-corpus multi-object tracking (OCMOT)
We build OCTrackB, a large-scale and comprehensive benchmark, to provide a standard evaluation platform for the OCMOT problem.
arXiv Detail & Related papers (2024-07-19T05:58:01Z) - The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models [94.31327813151208]
BiGGen Bench is a principled generation benchmark designed to thoroughly evaluate nine distinct capabilities of LMs across 77 diverse tasks.
A key feature of the BiGGen Bench is its use of instance-specific evaluation criteria, closely mirroring the nuanced discernment of human evaluation.
arXiv Detail & Related papers (2024-06-09T12:30:30Z) - How to Evaluate Entity Resolution Systems: An Entity-Centric Framework with Application to Inventor Name Disambiguation [1.7812428873698403]
We propose an entity-centric data labeling methodology that integrates with a unified framework for monitoring summary statistics.
These benchmark data sets can then be used for model training and a variety of evaluation tasks.
arXiv Detail & Related papers (2024-04-08T15:53:29Z) - CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models [49.16989035566899]
Retrieval-Augmented Generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources.
This paper constructs a large-scale and more comprehensive benchmark, and evaluates all the components of RAG systems in various RAG application scenarios.
arXiv Detail & Related papers (2024-01-30T14:25:32Z) - Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying References [123.39034752499076]
Div-Ref is a method to enhance evaluation benchmarks by enriching the number of references.
We conduct experiments to empirically demonstrate that diversifying the expression of reference can significantly enhance the correlation between automatic evaluation and human evaluation.
arXiv Detail & Related papers (2023-05-24T11:53:29Z) - A Fair and In-Depth Evaluation of Existing End-to-End Entity Linking
Systems [4.4351901934764975]
evaluations of entity linking systems often say little about how the system is going to perform for a particular application.
We provide a more meaningful and fair in-depth evaluation of a variety of existing end-to-end entity linkers.
Our evaluation is based on several widely used benchmarks, which exhibit the problems mentioned above to various degrees, as well as on two new benchmarks.
arXiv Detail & Related papers (2023-05-24T09:20:15Z) - Towards More Robust NLP System Evaluation: Handling Missing Scores in
Benchmarks [9.404931130084803]
This paper formalizes an existing problem in NLP research: benchmarking when some systems scores are missing on the task.
We introduce an extended benchmark, which contains over 131 million scores, an order of magnitude larger than existing benchmarks.
arXiv Detail & Related papers (2023-05-17T15:20:31Z) - Entity Disambiguation with Entity Definitions [50.01142092276296]
Local models have recently attained astounding performances in Entity Disambiguation (ED)
Previous works limited their studies to using, as the textual representation of each candidate, only its Wikipedia title.
In this paper, we address this limitation and investigate to what extent more expressive textual representations can mitigate it.
We report a new state of the art on 2 out of 6 benchmarks we consider and strongly improve the generalization capability over unseen patterns.
arXiv Detail & Related papers (2022-10-11T17:46:28Z) - Text Summarization with Latent Queries [60.468323530248945]
We introduce LaQSum, the first unified text summarization system that learns Latent Queries from documents for abstractive summarization with any existing query forms.
Under a deep generative framework, our system jointly optimize a latent query model and a conditional language model, allowing users to plug-and-play queries of any type at test time.
Our system robustly outperforms strong comparison systems across summarization benchmarks with different query types, document settings, and target domains.
arXiv Detail & Related papers (2021-05-31T21:14:58Z) - What Will it Take to Fix Benchmarking in Natural Language Understanding? [30.888416756627155]
We lay out four criteria that we argue NLU benchmarks should meet.
Restoring a healthy evaluation ecosystem will require significant progress in the design of benchmark datasets.
arXiv Detail & Related papers (2021-04-05T20:36:11Z) - Exploring and Analyzing Machine Commonsense Benchmarks [0.13999481573773073]
We argue that the lack of a common vocabulary for aligning these approaches' metadata limits researchers in their efforts to understand systems' deficiencies.
We describe our initial MCS Benchmark Ontology, an common vocabulary that formalizes benchmark metadata.
arXiv Detail & Related papers (2020-12-21T19:01:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.