MapperGPT: Large Language Models for Linking and Mapping Entities
- URL: http://arxiv.org/abs/2310.03666v1
- Date: Thu, 5 Oct 2023 16:43:04 GMT
- Title: MapperGPT: Large Language Models for Linking and Mapping Entities
- Authors: Nicolas Matentzoglu, J. Harry Caufield, Harshad B. Hegde, Justin T.
Reese, Sierra Moxon, Hyeongsik Kim, Nomi L. Harris, Melissa A Haendel,
Christopher J. Mungall
- Abstract summary: We present MapperGPT, an approach that uses Large Language Models to review and refine mapping as a post-processing step.
We show that when used in combination with high-recall methods, MapperGPT can provide a substantial improvement in accuracy.
- Score: 1.5340902251924438
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Aligning terminological resources, including ontologies, controlled
vocabularies, taxonomies, and value sets is a critical part of data integration
in many domains such as healthcare, chemistry, and biomedical research. Entity
mapping is the process of determining correspondences between entities across
these resources, such as gene identifiers, disease concepts, or chemical entity
identifiers. Many tools have been developed to compute such mappings based on
common structural features and lexical information such as labels and synonyms.
Lexical approaches in particular often provide very high recall, but low
precision, due to lexical ambiguity. As a consequence of this, mapping efforts
often resort to a labor intensive manual mapping refinement through a human
curator.
Large Language Models (LLMs), such as the ones employed by ChatGPT, have
generalizable abilities to perform a wide range of tasks, including
question-answering and information extraction. Here we present MapperGPT, an
approach that uses LLMs to review and refine mapping relationships as a
post-processing step, in concert with existing high-recall methods that are
based on lexical and structural heuristics.
We evaluated MapperGPT on a series of alignment tasks from different domains,
including anatomy, developmental biology, and renal diseases. We devised a
collection of tasks that are designed to be particularly challenging for
lexical methods. We show that when used in combination with high-recall
methods, MapperGPT can provide a substantial improvement in accuracy, beating
state-of-the-art (SOTA) methods such as LogMap.
Related papers
- Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - Local Large Language Models for Complex Structured Medical Tasks [0.0]
This paper introduces an approach that combines the language reasoning capabilities of large language models with the benefits of local training to tackle complex, domain-specific tasks.
Specifically, the authors demonstrate their approach by extracting structured condition codes from pathology reports.
arXiv Detail & Related papers (2023-08-03T12:36:13Z) - PMC-LLaMA: Towards Building Open-source Language Models for Medicine [62.39105735933138]
Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding.
LLMs struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge.
We describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.
arXiv Detail & Related papers (2023-04-27T18:29:05Z) - An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT [80.33783969507458]
The 'Impression' section of a radiology report is a critical basis for communication between radiologists and other physicians.
Recent studies have achieved promising results in automatic impression generation using large-scale medical text data.
These models often require substantial amounts of medical text data and have poor generalization performance.
arXiv Detail & Related papers (2023-04-17T17:13:42Z) - Nested Named Entity Recognition from Medical Texts: An Adaptive Shared
Network Architecture with Attentive CRF [53.55504611255664]
We propose a novel method, referred to as ASAC, to solve the dilemma caused by the nested phenomenon.
The proposed method contains two key modules: the adaptive shared (AS) part and the attentive conditional random field (ACRF) module.
Our model could learn better entity representations by capturing the implicit distinctions and relationships between different categories of entities.
arXiv Detail & Related papers (2022-11-09T09:23:56Z) - Knowledge-Rich Self-Supervised Entity Linking [58.838404666183656]
Knowledge-RIch Self-Supervision ($tt KRISSBERT$) is a universal entity linker for four million UMLS entities.
Our approach subsumes zero-shot and few-shot methods, and can easily incorporate entity descriptions and gold mention labels if available.
Without using any labeled information, our method produces $tt KRISSBERT$, a universal entity linker for four million UMLS entities.
arXiv Detail & Related papers (2021-12-15T05:05:12Z) - PharmKE: Knowledge Extraction Platform for Pharmaceutical Texts using
Transfer Learning [0.0]
PharmKE is a text analysis platform that applies deep learning through several stages for thorough semantic analysis of pharmaceutical articles.
The methodology is used to create accurately labeled training and test datasets, which are then used to train models for custom entity labeling tasks.
The obtained results are compared to the fine-tuned BERT and BioBERT models trained on the same dataset.
arXiv Detail & Related papers (2021-02-25T19:36:35Z) - A multi-perspective combined recall and rank framework for Chinese
procedure terminology normalization [11.371582109211815]
In this paper, we focus on Chinese procedure terminology normalization.
The expression of terminologies are various and one medical mention may be linked to multiple terminologies.
We propose a combined recall and rank framework to solve the above problems.
arXiv Detail & Related papers (2021-01-22T13:37:10Z) - A Lightweight Neural Model for Biomedical Entity Linking [1.8047694351309205]
We propose a lightweight neural method for biomedical entity linking.
Our method uses a simple alignment layer with attention mechanisms to capture the variations between mention and entity names.
Our model is competitive with previous work on standard evaluation benchmarks.
arXiv Detail & Related papers (2020-12-16T10:34:37Z) - Mapping Patterns for Virtual Knowledge Graphs [71.61234136161742]
Virtual Knowledge Graphs (VKG) constitute one of the most promising paradigms for integrating and accessing legacy data sources.
We build on well-established methodologies and patterns studied in data management, data analysis, and conceptual modeling.
We validate our catalog on the considered VKG scenarios, showing it covers the vast majority of patterns present therein.
arXiv Detail & Related papers (2020-12-03T13:54:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.