Automatic Glossary of Clinical Terminology: a Large-Scale Dictionary of
Biomedical Definitions Generated from Ontological Knowledge
- URL: http://arxiv.org/abs/2306.00665v1
- Date: Thu, 1 Jun 2023 13:37:55 GMT
- Title: Automatic Glossary of Clinical Terminology: a Large-Scale Dictionary of
Biomedical Definitions Generated from Ontological Knowledge
- Authors: Fran\c{c}ois Remy, Thomas Demeester
- Abstract summary: More than 400,000 biomedical concepts and some of their relationships are contained in SnomedCT.
Clear definitions or descriptions in understandable language are often not available.
AGCT contains 422,070 computer-generated definitions for SnomedCT concepts.
- Score: 14.531480317300856
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Background: More than 400,000 biomedical concepts and some of their
relationships are contained in SnomedCT, a comprehensive biomedical ontology.
However, their concept names are not always readily interpretable by
non-experts, or patients looking at their own electronic health records (EHR).
Clear definitions or descriptions in understandable language are often not
available. Therefore, generating human-readable definitions for biomedical
concepts might help make the information they encode more accessible and
understandable to a wider public.
Objective: In this article, we introduce the Automatic Glossary of Clinical
Terminology (AGCT), a large-scale biomedical dictionary of clinical concepts
generated using high-quality information extracted from the biomedical
knowledge contained in SnomedCT.
Methods: We generate a novel definition for every SnomedCT concept, after
prompting the OpenAI Turbo model, a variant of GPT 3.5, using a high-quality
verbalization of the SnomedCT relationships of the to-be-defined concept. A
significant subset of the generated definitions was subsequently judged by NLP
researchers with biomedical expertise on 5-point scales along the following
three axes: factuality, insight, and fluency.
Results: AGCT contains 422,070 computer-generated definitions for SnomedCT
concepts, covering various domains such as diseases, procedures, drugs, and
anatomy. The average length of the definitions is 49 words. The definitions
were assigned average scores of over 4.5 out of 5 on all three axes, indicating
a majority of factual, insightful, and fluent definitions.
Conclusion: AGCT is a novel and valuable resource for biomedical tasks that
require human-readable definitions for SnomedCT concepts. It can also serve as
a base for developing robust biomedical retrieval models or other applications
that leverage natural language understanding of biomedical knowledge.
Related papers
- Unified Representation of Genomic and Biomedical Concepts through Multi-Task, Multi-Source Contrastive Learning [45.6771125432388]
We introduce GENomic REpresentation with Language Model (GENEREL)
GENEREL is a framework designed to bridge genetic and biomedical knowledge bases.
Our experiments demonstrate GENEREL's ability to effectively capture the nuanced relationships between SNPs and clinical concepts.
arXiv Detail & Related papers (2024-10-14T04:19:52Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - Biomedical Language Models are Robust to Sub-optimal Tokenization [30.175714262031253]
Most modern biomedical language models (LMs) are pre-trained using standard domain-specific tokenizers.
We find that pre-training a biomedical LM using a more accurate biomedical tokenizer does not improve the entity representation quality of a language model.
arXiv Detail & Related papers (2023-06-30T13:35:24Z) - LLaVA-Med: Training a Large Language-and-Vision Assistant for
Biomedicine in One Day [85.19963303642427]
We propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images.
The model first learns to align biomedical vocabulary using the figure-caption pairs as is, then learns to master open-ended conversational semantics.
This enables us to train a Large Language and Vision Assistant for BioMedicine in less than 15 hours (with eight A100s)
arXiv Detail & Related papers (2023-06-01T16:50:07Z) - Biomedical Named Entity Recognition via Dictionary-based Synonym
Generalization [51.89486520806639]
We propose a novel Synonym Generalization (SynGen) framework that recognizes the biomedical concepts contained in the input text using span-based predictions.
We extensively evaluate our approach on a wide range of benchmarks and the results verify that SynGen outperforms previous dictionary-based models by notable margins.
arXiv Detail & Related papers (2023-05-22T14:36:32Z) - BioLORD: Learning Ontological Representations from Definitions (for
Biomedical Concepts and their Textual Descriptions) [17.981285086380147]
BioLORD is a new pre-training strategy for producing meaningful representations for clinical sentences and biomedical concepts.
Because biomedical names are not always self-explanatory, it sometimes results in non-semantic representations.
BioLORD overcomes this issue by grounding its concept representations using definitions, as well as short descriptions derived from a multi-relational knowledge graph.
arXiv Detail & Related papers (2022-10-21T11:43:59Z) - Generative Biomedical Entity Linking via Knowledge Base-Guided
Pre-training and Synonyms-Aware Fine-tuning [0.8154691566915505]
We propose a generative approach to model biomedical entity linking (EL)
We propose KB-guided pre-training by constructing synthetic samples with synonyms and definitions from KB.
We also propose synonyms-aware fine-tuning to select concept names for training, and propose decoder prompt and multi-synonyms constrained prefix tree for inference.
arXiv Detail & Related papers (2022-04-11T14:50:51Z) - Clinical Named Entity Recognition using Contextualized Token
Representations [49.036805795072645]
This paper introduces the technique of contextualized word embedding to better capture the semantic meaning of each word based on its context.
We pre-train two deep contextualized language models, Clinical Embeddings from Language Model (C-ELMo) and Clinical Contextual String Embeddings (C-Flair)
Explicit experiments show that our models gain dramatic improvements compared to both static word embeddings and domain-generic language models.
arXiv Detail & Related papers (2021-06-23T18:12:58Z) - CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark [51.38557174322772]
We present the first Chinese Biomedical Language Understanding Evaluation benchmark.
It is a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification.
We report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.
arXiv Detail & Related papers (2021-06-15T12:25:30Z) - A Lightweight Neural Model for Biomedical Entity Linking [1.8047694351309205]
We propose a lightweight neural method for biomedical entity linking.
Our method uses a simple alignment layer with attention mechanisms to capture the variations between mention and entity names.
Our model is competitive with previous work on standard evaluation benchmarks.
arXiv Detail & Related papers (2020-12-16T10:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.