From Retrieval to Generation: Efficient and Effective Entity Set Expansion
- URL: http://arxiv.org/abs/2304.03531v4
- Date: Sun, 4 Aug 2024 05:45:05 GMT
- Title: From Retrieval to Generation: Efficient and Effective Entity Set Expansion
- Authors: Shulin Huang, Shirong Ma, Yangning Li, Yinghui Li, Hai-Tao Zheng,
- Abstract summary: Entity Set Expansion (ESE) is a critical task aiming at expanding entities of the target semantic class described by seed entities.
Most existing ESE methods are retrieval-based frameworks that need to extract contextual features of entities and calculate the similarity between seed entities and candidate entities.
We propose Generative Entity Set Expansion (GenExpan) framework, which utilizes a generative pre-trained auto-regressive language model to accomplish ESE task.
- Score: 23.535181796796678
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Entity Set Expansion (ESE) is a critical task aiming at expanding entities of the target semantic class described by seed entities. Most existing ESE methods are retrieval-based frameworks that need to extract contextual features of entities and calculate the similarity between seed entities and candidate entities. To achieve the two purposes, they iteratively traverse the corpus and the entity vocabulary, resulting in poor efficiency and scalability. Experimental results indicate that the time consumed by the retrieval-based ESE methods increases linearly with entity vocabulary and corpus size. In this paper, we firstly propose Generative Entity Set Expansion (GenExpan) framework, which utilizes a generative pre-trained auto-regressive language model to accomplish ESE task. Specifically, a prefix tree is employed to guarantee the validity of entity generation, and automatically generated class names are adopted to guide the model to generate target entities. Moreover, we propose Knowledge Calibration and Generative Ranking to further bridge the gap between generic knowledge of the language model and the goal of ESE task. For efficiency, expansion time consumed by GenExpan is independent of entity vocabulary and corpus size, and GenExpan achieves an average 600% speedup compared to strong baselines. For expansion effectiveness, our framework outperforms previous state-of-the-art ESE methods.
Related papers
- Entity Disambiguation via Fusion Entity Decoding [68.77265315142296]
We propose an encoder-decoder model to disambiguate entities with more detailed entity descriptions.
We observe +1.5% improvements in end-to-end entity linking in the GERBIL benchmark compared with EntQA.
arXiv Detail & Related papers (2024-04-02T04:27:54Z) - Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction.
We reformulate the task to be entity-centric, enabling the use of diverse metrics.
We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z) - CorpusLM: Towards a Unified Language Model on Corpus for Knowledge-Intensive Tasks [20.390672895839757]
Retrieval-augmented generation (RAG) has emerged as a popular solution to enhance factual accuracy.
Traditional retrieval modules often rely on large document index and disconnect with generative tasks.
We propose textbfCorpusLM, a unified language model that integrates generative retrieval, closed-book generation, and RAG.
arXiv Detail & Related papers (2024-02-02T06:44:22Z) - Dynamic Retrieval-Augmented Generation [4.741884506444161]
We propose a novel approach for the Dynamic Retrieval-Augmented Generation (DRAG)
DRAG injects compressed embeddings of the retrieved entities into the generative model.
Our approach achieves several targets: (1) lifting the length limitations of the context window, saving on the prompt size; (2) allowing huge expansion of the number of retrieval entities available for the context; (3) alleviating the problem of misspelling or failing to find relevant entity names.
arXiv Detail & Related papers (2023-12-14T14:26:57Z) - Instructed Language Models with Retrievers Are Powerful Entity Linkers [87.16283281290053]
Instructed Generative Entity Linker (INSGENEL) is the first approach that enables casual language models to perform entity linking over knowledge bases.
INSGENEL outperforms previous generative alternatives with +6.8 F1 points gain on average.
arXiv Detail & Related papers (2023-11-06T16:38:51Z) - Active Retrieval Augmented Generation [123.68874416084499]
Augmenting large language models (LMs) by retrieving information from external knowledge resources is one promising solution.
Most existing retrieval augmented LMs employ a retrieve-and-generate setup that only retrieves information once based on the input.
We propose Forward-Looking Active REtrieval augmented generation (FLARE), a generic method which iteratively uses a prediction of the upcoming sentence to anticipate future content.
arXiv Detail & Related papers (2023-05-11T17:13:40Z) - DORE: Document Ordered Relation Extraction based on Generative Framework [56.537386636819626]
This paper investigates the root cause of the underwhelming performance of the existing generative DocRE models.
We propose to generate a symbolic and ordered sequence from the relation matrix which is deterministic and easier for model to learn.
Experimental results on four datasets show that our proposed method can improve the performance of the generative DocRE models.
arXiv Detail & Related papers (2022-10-28T11:18:10Z) - Automatic Context Pattern Generation for Entity Set Expansion [40.535332689515656]
We develop a module that automatically generates high-quality context patterns for entities.
We also propose the GAPA framework that leverages the aforementioned GenerAted PAtterns to expand target entities.
arXiv Detail & Related papers (2022-07-17T06:50:35Z) - Contrastive Learning with Hard Negative Entities for Entity Set
Expansion [29.155036098444008]
Various NLP and IR applications will benefit from ESE due to its ability to discover knowledge.
We devise an entity-level masked language model with contrastive learning to refine the representation of entities.
In addition, we propose the ProbExpan, a novel probabilistic ESE framework utilizing the entity representation obtained by the aforementioned language model to expand entities.
arXiv Detail & Related papers (2022-04-16T12:26:42Z) - UNER: Universal Named-Entity RecognitionFramework [0.0]
We create the first multilingual UNER corpus: the SETimesparallel corpus annotated for named-entities.
The English SETimescorpus will be annotated using existing tools and knowledge bases.
The resulting annotations will be propagated automatically to other languages within the SE-Times corpora.
arXiv Detail & Related papers (2020-10-23T13:53:31Z) - Empower Entity Set Expansion via Language Model Probing [58.78909391545238]
Existing set expansion methods bootstrap the seed entity set by adaptively selecting context features and extracting new entities.
A key challenge for entity set expansion is to avoid selecting ambiguous context features which will shift the class semantics and lead to accumulative errors in later iterations.
We propose a novel iterative set expansion framework that leverages automatically generated class names to address the semantic drift issue.
arXiv Detail & Related papers (2020-04-29T00:09:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.