Related papers: DESCGEN: A Distantly Supervised Datasetfor Generating Abstractive Entity Descriptions

DESCGEN: A Distantly Supervised Datasetfor Generating Abstractive Entity Descriptions

URL: http://arxiv.org/abs/2106.05365v1
Date: Wed, 9 Jun 2021 20:10:48 GMT
Title: DESCGEN: A Distantly Supervised Datasetfor Generating Abstractive Entity Descriptions
Authors: Weijia Shi, Mandar Joshi, Luke Zettlemoyer
Abstract summary: We introduce DESCGEN: given mentions spread over multiple documents, the goal is to generate an entity summary description. DESCGEN consists of 37K entity descriptions from Wikipedia and Fandom, each paired with nine evidence documents on average. The resulting summaries are more abstractive than those found in existing datasets and provide a better proxy for the challenge of describing new and emerging entities.
Score: 41.80938919728834
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Short textual descriptions of entities provide summaries of their key attributes and have been shown to be useful sources of background knowledge for tasks such as entity linking and question answering. However, generating entity descriptions, especially for new and long-tail entities, can be challenging since relevant information is often scattered across multiple sources with varied content and style. We introduce DESCGEN: given mentions spread over multiple documents, the goal is to generate an entity summary description. DESCGEN consists of 37K entity descriptions from Wikipedia and Fandom, each paired with nine evidence documents on average. The documents were collected using a combination of entity linking and hyperlinks to the Wikipedia and Fandom entity pages, which together provide high-quality distant supervision. The resulting summaries are more abstractive than those found in existing datasets and provide a better proxy for the challenge of describing new and emerging entities. We also propose a two-stage extract-then-generate baseline and show that there exists a large gap (19.9% in ROUGE-L) between state-of-the-art models and human performance, suggesting that the data will support significant future work.

Related papers

Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data. We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation. Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z)
Generative Retrieval Meets Multi-Graded Relevance [104.75244721442756]
We introduce a framework called GRaded Generative Retrieval (GR$2$) GR$2$ focuses on two key components: ensuring relevant and distinct identifiers, and implementing multi-graded constrained contrastive training. Experiments on datasets with both multi-graded and binary relevance demonstrate the effectiveness of GR$2$.
arXiv Detail & Related papers (2024-09-27T02:55:53Z)
Wiki Entity Summarization Benchmark [9.25319552487389]
Entity summarization aims to compute concise summaries for entities in knowledge graphs. Existing datasets and benchmarks are often limited to a few hundred entities. We propose WikES, a comprehensive benchmark comprising of entities, their summaries, and their connections.
arXiv Detail & Related papers (2024-06-12T17:22:00Z)
Hypertext Entity Extraction in Webpage [112.56734676713721]
We introduce a textbfMoE-based textbfEntity textbfExtraction textbfFramework (textitMoEEF), which integrates multiple features to enhance model performance. We also analyze the effectiveness of hypertext features in textitHEED and several model components in textitMoEEF.
arXiv Detail & Related papers (2024-03-04T03:21:40Z)
AMELI: Enhancing Multimodal Entity Linking with Fine-Grained Attributes [46.67148487519558]
We propose attribute-aware multimodal entity linking.<n>The input consists of a mention described with a text paragraph and images.<n>The goal is to predict the corresponding target entity from a multimodal knowledge base.
arXiv Detail & Related papers (2023-05-24T05:01:48Z)
Modeling Entities as Semantic Points for Visual Information Extraction in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images. We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities. The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z)
The Fellowship of the Authors: Disambiguating Names from Social Network Context [2.3605348648054454]
Authority lists with extensive textual descriptions for each entity are lacking and ambiguous named entities. We combine BERT-based mention representations with a variety of graph induction strategies and experiment with supervised and unsupervised cluster inference methods. We find that in-domain language model pretraining can significantly improve mention representations, especially for larger corpora.
arXiv Detail & Related papers (2022-08-31T21:51:55Z)
ReFinED: An Efficient Zero-shot-capable Approach to End-to-End Entity Linking [5.382800665115746]
ReFinED is an efficient end-to-end entity linking model. It performs mention detection, fine-grained entity typing, and entity disambiguation for all mentions within a document in a single forward pass. It surpasses state-of-the-art performance on standard entity linking datasets by an average of 3.7 F1.
arXiv Detail & Related papers (2022-07-08T19:20:42Z)
On Generating Extended Summaries of Long Documents [16.149617108647707]
We present a new method for generating extended summaries of long papers. Our method exploits hierarchical structure of the documents and incorporates it into an extractive summarization model. Our analysis shows that our multi-tasking approach can adjust extraction probability distribution to the favor of summary-worthy sentences.
arXiv Detail & Related papers (2020-12-28T08:10:28Z)
Leveraging Graph to Improve Abstractive Multi-Document Summarization [50.62418656177642]
We develop a neural abstractive multi-document summarization (MDS) model which can leverage well-known graph representations of documents. Our model utilizes graphs to encode documents in order to capture cross-document relations, which is crucial to summarizing long documents. Our model can also take advantage of graphs to guide the summary generation process, which is beneficial for generating coherent and concise summaries.
arXiv Detail & Related papers (2020-05-20T13:39:47Z)
ENT-DESC: Entity Description Generation by Exploring Knowledge Graph [53.03778194567752]
In practice, the input knowledge could be more than enough, since the output description may only cover the most significant knowledge. We introduce a large-scale and challenging dataset to facilitate the study of such a practical scenario in KG-to-text. We propose a multi-graph structure that is able to represent the original graph information more comprehensively.
arXiv Detail & Related papers (2020-04-30T14:16:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.