Knowledge-Rich Self-Supervised Entity Linking
- URL: http://arxiv.org/abs/2112.07887v1
- Date: Wed, 15 Dec 2021 05:05:12 GMT
- Title: Knowledge-Rich Self-Supervised Entity Linking
- Authors: Sheng Zhang, Hao Cheng, Shikhar Vashishth, Cliff Wong, Jinfeng Xiao,
Xiaodong Liu, Tristan Naumann, Jianfeng Gao, Hoifung Poon
- Abstract summary: Knowledge-RIch Self-Supervision ($tt KRISSBERT$) is a universal entity linker for four million UMLS entities.
Our approach subsumes zero-shot and few-shot methods, and can easily incorporate entity descriptions and gold mention labels if available.
Without using any labeled information, our method produces $tt KRISSBERT$, a universal entity linker for four million UMLS entities.
- Score: 58.838404666183656
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Entity linking faces significant challenges, such as prolific variations and
prevalent ambiguities, especially in high-value domains with myriad entities.
Standard classification approaches suffer from the annotation bottleneck and
cannot effectively handle unseen entities. Zero-shot entity linking has emerged
as a promising direction for generalizing to new entities, but it still
requires example gold entity mentions during training and canonical
descriptions for all entities, both of which are rarely available outside of
Wikipedia. In this paper, we explore Knowledge-RIch Self-Supervision ($\tt
KRISS$) for entity linking, by leveraging readily available domain knowledge.
In training, it generates self-supervised mention examples on unlabeled text
using a domain ontology and trains a contextual encoder using contrastive
learning. For inference, it samples self-supervised mentions as prototypes for
each entity and conducts linking by mapping the test mention to the most
similar prototype. Our approach subsumes zero-shot and few-shot methods, and
can easily incorporate entity descriptions and gold mention labels if
available. Using biomedicine as a case study, we conducted extensive
experiments on seven standard datasets spanning biomedical literature and
clinical notes. Without using any labeled information, our method produces $\tt
KRISSBERT$, a universal entity linker for four million UMLS entities, which
attains new state of the art, outperforming prior self-supervised methods by as
much as over 20 absolute points in accuracy.
Related papers
- Entity Disambiguation via Fusion Entity Decoding [68.77265315142296]
We propose an encoder-decoder model to disambiguate entities with more detailed entity descriptions.
We observe +1.5% improvements in end-to-end entity linking in the GERBIL benchmark compared with EntQA.
arXiv Detail & Related papers (2024-04-02T04:27:54Z) - Seed-Guided Fine-Grained Entity Typing in Science and Engineering
Domains [51.02035914828596]
We study the task of seed-guided fine-grained entity typing in science and engineering domains.
We propose SEType which first enriches the weak supervision by finding more entities for each seen type from an unlabeled corpus.
It then matches the enriched entities to unlabeled text to get pseudo-labeled samples and trains a textual entailment model that can make inferences for both seen and unseen types.
arXiv Detail & Related papers (2024-01-23T22:36:03Z) - Partial Annotation Learning for Biomedical Entity Recognition [0.19336815376402716]
We show that partial annotation learning methods can effectively learn from biomedical corpora with missing entity annotations.
Our proposed model outperforms alternatives and, specifically, the PubMedBERT tagger by 38% in F1-score under high missing entity rates.
arXiv Detail & Related papers (2023-05-22T15:18:38Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - Improving Entity Linking through Semantic Reinforced Entity Embeddings [16.868791358905916]
We propose a method to inject fine-grained semantic information into entity embeddings to reduce the distinctiveness and facilitate the learning of contextual commonality.
Based on our entity embeddings, we achieved new sate-of-the-art performance on entity linking.
arXiv Detail & Related papers (2021-06-16T00:27:56Z) - KGSynNet: A Novel Entity Synonyms Discovery Framework with Knowledge
Graph [23.053995137917994]
We propose a novel entity synonyms discovery framework, named emphKGSynNet.
Specifically, we pre-train subword embeddings for mentions and entities using a large-scale domain-specific corpus.
We employ a specifically designed emphfusion gate to adaptively absorb the entities' knowledge information into their semantic features.
arXiv Detail & Related papers (2021-03-16T07:32:33Z) - UniT: Unified Knowledge Transfer for Any-shot Object Detection and
Segmentation [52.487469544343305]
Methods for object detection and segmentation rely on large scale instance-level annotations for training.
We propose an intuitive and unified semi-supervised model that is applicable to a range of supervision.
arXiv Detail & Related papers (2020-06-12T22:45:47Z) - Interpretable Entity Representations through Large-Scale Typing [61.4277527871572]
We present an approach to creating entity representations that are human readable and achieve high performance out of the box.
Our representations are vectors whose values correspond to posterior probabilities over fine-grained entity types.
We show that it is possible to reduce the size of our type set in a learning-based way for particular domains.
arXiv Detail & Related papers (2020-04-30T23:58:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.