DESCGEN: A Distantly Supervised Datasetfor Generating Abstractive Entity
Descriptions
- URL: http://arxiv.org/abs/2106.05365v1
- Date: Wed, 9 Jun 2021 20:10:48 GMT
- Title: DESCGEN: A Distantly Supervised Datasetfor Generating Abstractive Entity
Descriptions
- Authors: Weijia Shi, Mandar Joshi, Luke Zettlemoyer
- Abstract summary: We introduce DESCGEN: given mentions spread over multiple documents, the goal is to generate an entity summary description.
DESCGEN consists of 37K entity descriptions from Wikipedia and Fandom, each paired with nine evidence documents on average.
The resulting summaries are more abstractive than those found in existing datasets and provide a better proxy for the challenge of describing new and emerging entities.
- Score: 41.80938919728834
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Short textual descriptions of entities provide summaries of their key
attributes and have been shown to be useful sources of background knowledge for
tasks such as entity linking and question answering. However, generating entity
descriptions, especially for new and long-tail entities, can be challenging
since relevant information is often scattered across multiple sources with
varied content and style. We introduce DESCGEN: given mentions spread over
multiple documents, the goal is to generate an entity summary description.
DESCGEN consists of 37K entity descriptions from Wikipedia and Fandom, each
paired with nine evidence documents on average. The documents were collected
using a combination of entity linking and hyperlinks to the Wikipedia and
Fandom entity pages, which together provide high-quality distant supervision.
The resulting summaries are more abstractive than those found in existing
datasets and provide a better proxy for the challenge of describing new and
emerging entities. We also propose a two-stage extract-then-generate baseline
and show that there exists a large gap (19.9% in ROUGE-L) between
state-of-the-art models and human performance, suggesting that the data will
support significant future work.
Related papers
- Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - Generative Retrieval Meets Multi-Graded Relevance [104.75244721442756]
We introduce a framework called GRaded Generative Retrieval (GR$2$)
GR$2$ focuses on two key components: ensuring relevant and distinct identifiers, and implementing multi-graded constrained contrastive training.
Experiments on datasets with both multi-graded and binary relevance demonstrate the effectiveness of GR$2$.
arXiv Detail & Related papers (2024-09-27T02:55:53Z) - Wiki Entity Summarization Benchmark [9.25319552487389]
Entity summarization aims to compute concise summaries for entities in knowledge graphs.
Existing datasets and benchmarks are often limited to a few hundred entities.
We propose WikES, a comprehensive benchmark comprising of entities, their summaries, and their connections.
arXiv Detail & Related papers (2024-06-12T17:22:00Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - The Fellowship of the Authors: Disambiguating Names from Social Network
Context [2.3605348648054454]
Authority lists with extensive textual descriptions for each entity are lacking and ambiguous named entities.
We combine BERT-based mention representations with a variety of graph induction strategies and experiment with supervised and unsupervised cluster inference methods.
We find that in-domain language model pretraining can significantly improve mention representations, especially for larger corpora.
arXiv Detail & Related papers (2022-08-31T21:51:55Z) - ReFinED: An Efficient Zero-shot-capable Approach to End-to-End Entity
Linking [5.382800665115746]
ReFinED is an efficient end-to-end entity linking model.
It performs mention detection, fine-grained entity typing, and entity disambiguation for all mentions within a document in a single forward pass.
It surpasses state-of-the-art performance on standard entity linking datasets by an average of 3.7 F1.
arXiv Detail & Related papers (2022-07-08T19:20:42Z) - On Generating Extended Summaries of Long Documents [16.149617108647707]
We present a new method for generating extended summaries of long papers.
Our method exploits hierarchical structure of the documents and incorporates it into an extractive summarization model.
Our analysis shows that our multi-tasking approach can adjust extraction probability distribution to the favor of summary-worthy sentences.
arXiv Detail & Related papers (2020-12-28T08:10:28Z) - Leveraging Graph to Improve Abstractive Multi-Document Summarization [50.62418656177642]
We develop a neural abstractive multi-document summarization (MDS) model which can leverage well-known graph representations of documents.
Our model utilizes graphs to encode documents in order to capture cross-document relations, which is crucial to summarizing long documents.
Our model can also take advantage of graphs to guide the summary generation process, which is beneficial for generating coherent and concise summaries.
arXiv Detail & Related papers (2020-05-20T13:39:47Z) - ENT-DESC: Entity Description Generation by Exploring Knowledge Graph [53.03778194567752]
In practice, the input knowledge could be more than enough, since the output description may only cover the most significant knowledge.
We introduce a large-scale and challenging dataset to facilitate the study of such a practical scenario in KG-to-text.
We propose a multi-graph structure that is able to represent the original graph information more comprehensively.
arXiv Detail & Related papers (2020-04-30T14:16:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.