BioLORD: Learning Ontological Representations from Definitions (for
Biomedical Concepts and their Textual Descriptions)
- URL: http://arxiv.org/abs/2210.11892v1
- Date: Fri, 21 Oct 2022 11:43:59 GMT
- Title: BioLORD: Learning Ontological Representations from Definitions (for
Biomedical Concepts and their Textual Descriptions)
- Authors: Fran\c{c}ois Remy, Kris Demuynck and Thomas Demeester
- Abstract summary: BioLORD is a new pre-training strategy for producing meaningful representations for clinical sentences and biomedical concepts.
Because biomedical names are not always self-explanatory, it sometimes results in non-semantic representations.
BioLORD overcomes this issue by grounding its concept representations using definitions, as well as short descriptions derived from a multi-relational knowledge graph.
- Score: 17.981285086380147
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work introduces BioLORD, a new pre-training strategy for producing
meaningful representations for clinical sentences and biomedical concepts.
State-of-the-art methodologies operate by maximizing the similarity in
representation of names referring to the same concept, and preventing collapse
through contrastive learning. However, because biomedical names are not always
self-explanatory, it sometimes results in non-semantic representations. BioLORD
overcomes this issue by grounding its concept representations using
definitions, as well as short descriptions derived from a multi-relational
knowledge graph consisting of biomedical ontologies. Thanks to this grounding,
our model produces more semantic concept representations that match more
closely the hierarchical structure of ontologies. BioLORD establishes a new
state of the art for text similarity on both clinical sentences (MedSTS) and
biomedical concepts (MayoSRS).
Related papers
- Unified Representation of Genomic and Biomedical Concepts through Multi-Task, Multi-Source Contrastive Learning [45.6771125432388]
We introduce GENomic REpresentation with Language Model (GENEREL)
GENEREL is a framework designed to bridge genetic and biomedical knowledge bases.
Our experiments demonstrate GENEREL's ability to effectively capture the nuanced relationships between SNPs and clinical concepts.
arXiv Detail & Related papers (2024-10-14T04:19:52Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - BiomedJourney: Counterfactual Biomedical Image Generation by
Instruction-Learning from Multimodal Patient Journeys [99.7082441544384]
We present BiomedJourney, a novel method for counterfactual biomedical image generation by instruction-learning.
We use GPT-4 to process the corresponding imaging reports and generate a natural language description of disease progression.
The resulting triples are then used to train a latent diffusion model for counterfactual biomedical image generation.
arXiv Detail & Related papers (2023-10-16T18:59:31Z) - ReOnto: A Neuro-Symbolic Approach for Biomedical Relation Extraction [3.263873198567265]
ReOnto Relation Extraction (RE) is the task of extracting semantic relationships between entities in a sentence and aligning them to relations defined in a vocabulary.
We present a novel technique that makes use of neuro symbolic knowledge for the RE task.
Experimental results on two public biomedical datasets, BioRel and ADE, show that our method outperforms all the baselines.
arXiv Detail & Related papers (2023-09-04T05:36:58Z) - Automatic Glossary of Clinical Terminology: a Large-Scale Dictionary of
Biomedical Definitions Generated from Ontological Knowledge [14.531480317300856]
More than 400,000 biomedical concepts and some of their relationships are contained in SnomedCT.
Clear definitions or descriptions in understandable language are often not available.
AGCT contains 422,070 computer-generated definitions for SnomedCT concepts.
arXiv Detail & Related papers (2023-06-01T13:37:55Z) - Biomedical Named Entity Recognition via Dictionary-based Synonym
Generalization [51.89486520806639]
We propose a novel Synonym Generalization (SynGen) framework that recognizes the biomedical concepts contained in the input text using span-based predictions.
We extensively evaluate our approach on a wide range of benchmarks and the results verify that SynGen outperforms previous dictionary-based models by notable margins.
arXiv Detail & Related papers (2023-05-22T14:36:32Z) - Detecting Idiomatic Multiword Expressions in Clinical Terminology using
Definition-Based Representation Learning [12.30055843580139]
We develop an effective tool for scoring the idiomaticity of biomedical MWEs based on the degree of similarity between the semantic representations of those MWEs and a weighted average of the representation of their constituents.
Our results show that the BioLORD model has a strong ability to identify idiomatic MWEs, not replicated in other models.
arXiv Detail & Related papers (2023-05-11T13:42:58Z) - Automatic Biomedical Term Clustering by Learning Fine-grained Term
Representations [0.8154691566915505]
State-of-the-art term embeddings leverage pretrained language models to encode terms and use synonyms and relation knowledge from knowledge graphs to guide contrastive learning.
These embeddings are not sensitive to minor textual differences which leads to failure for biomedical term clustering.
To alleviate this problem, we adjust the sampling strategy in pretraining term embeddings by providing dynamic hard positive and negative samples.
We name our proposed method as CODER++, and it has been applied in clustering biomedical concepts in the newly released Biomedical Knowledge Graph named BIOS.
arXiv Detail & Related papers (2022-04-01T12:30:58Z) - Clinical Named Entity Recognition using Contextualized Token
Representations [49.036805795072645]
This paper introduces the technique of contextualized word embedding to better capture the semantic meaning of each word based on its context.
We pre-train two deep contextualized language models, Clinical Embeddings from Language Model (C-ELMo) and Clinical Contextual String Embeddings (C-Flair)
Explicit experiments show that our models gain dramatic improvements compared to both static word embeddings and domain-generic language models.
arXiv Detail & Related papers (2021-06-23T18:12:58Z) - UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual
Embeddings Using the Unified Medical Language System Metathesaurus [73.86656026386038]
We introduce UmlsBERT, a contextual embedding model that integrates domain knowledge during the pre-training process.
By applying these two strategies, UmlsBERT can encode clinical domain knowledge into word embeddings and outperform existing domain-specific models.
arXiv Detail & Related papers (2020-10-20T15:56:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.