Combining Contrastive Learning and Knowledge Graph Embeddings to develop
medical word embeddings for the Italian language
- URL: http://arxiv.org/abs/2211.05035v1
- Date: Wed, 9 Nov 2022 17:12:28 GMT
- Title: Combining Contrastive Learning and Knowledge Graph Embeddings to develop
medical word embeddings for the Italian language
- Authors: Denys Amore Bondarenko, Roger Ferrod, Luigi Di Caro
- Abstract summary: This paper attempts to improve available embeddings in the uncovered niche of the Italian medical domain.
The main objective is to improve the accuracy of semantic similarity between medical terms.
Since the Italian language lacks medical texts and controlled vocabularies, we have developed a specific solution.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Word embeddings play a significant role in today's Natural Language
Processing tasks and applications. While pre-trained models may be directly
employed and integrated into existing pipelines, they are often fine-tuned to
better fit with specific languages or domains. In this paper, we attempt to
improve available embeddings in the uncovered niche of the Italian medical
domain through the combination of Contrastive Learning (CL) and Knowledge Graph
Embedding (KGE). The main objective is to improve the accuracy of semantic
similarity between medical terms, which is also used as an evaluation task.
Since the Italian language lacks medical texts and controlled vocabularies, we
have developed a specific solution by combining preexisting CL methods
(multi-similarity loss, contextualization, dynamic sampling) and the
integration of KGEs, creating a new variant of the loss. Although without
having outperformed the state-of-the-art, represented by multilingual models,
the obtained results are encouraging, providing a significant leap in
performance compared to the starting model, while using a significantly lower
amount of data.
Related papers
- CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning [4.004641316826348]
We introduce a novel language-image Contrastive Learning method with an Efficient large language model and prompt Fine-Tuning (CLEFT)
Our method demonstrates state-of-the-art performance on multiple chest X-ray and mammography datasets.
The proposed parameter efficient framework can reduce the total trainable model size by 39% and reduce the trainable language model to only 4% compared with the current BERT encoder.
arXiv Detail & Related papers (2024-07-30T17:57:32Z) - ECAMP: Entity-centered Context-aware Medical Vision Language Pre-training [21.315060059765894]
We propose a novel framework for entity-centered medical vision-language pre-training.
We distill entity-centered context from medical reports to gain more effective supervision from the text modality.
Our proposed multi-scale context fusion design also improves the semantic integration of both coarse and fine-level image representations.
arXiv Detail & Related papers (2023-12-20T11:00:54Z) - Improving Language Models Meaning Understanding and Consistency by
Learning Conceptual Roles from Dictionary [65.268245109828]
Non-human-like behaviour of contemporary pre-trained language models (PLMs) is a leading cause undermining their trustworthiness.
A striking phenomenon is the generation of inconsistent predictions, which produces contradictory results.
We propose a practical approach that alleviates the inconsistent behaviour issue by improving PLM awareness.
arXiv Detail & Related papers (2023-10-24T06:15:15Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - To Augment or Not to Augment? A Comparative Study on Text Augmentation
Techniques for Low-Resource NLP [0.0]
We investigate three categories of text augmentation methodologies which perform changes on the syntax.
We compare them on part-of-speech tagging, dependency parsing and semantic role labeling for a diverse set of language families.
Our results suggest that the augmentation techniques can further improve over strong baselines based on mBERT.
arXiv Detail & Related papers (2021-11-18T10:52:48Z) - Cross-lingual Text Classification with Heterogeneous Graph Neural
Network [2.6936806968297913]
Cross-lingual text classification aims at training a classifier on the source language and transferring the knowledge to target languages.
Recent multilingual pretrained language models (mPLM) achieve impressive results in cross-lingual classification tasks.
We propose a simple yet effective method to incorporate heterogeneous information within and across languages for cross-lingual text classification.
arXiv Detail & Related papers (2021-05-24T12:45:42Z) - Integration of Domain Knowledge using Medical Knowledge Graph Deep
Learning for Cancer Phenotyping [6.077023952306772]
We propose a method to integrate external knowledge from medical terminology into the context captured by word embeddings.
We evaluate the proposed approach using a Multitask Convolutional Neural Network (MT-CNN) to extract six cancer characteristics from a dataset of 900K cancer pathology reports.
arXiv Detail & Related papers (2021-01-05T03:59:43Z) - Learning Contextualised Cross-lingual Word Embeddings and Alignments for
Extremely Low-Resource Languages Using Parallel Corpora [63.5286019659504]
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus.
Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence.
arXiv Detail & Related papers (2020-10-27T22:24:01Z) - UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual
Embeddings Using the Unified Medical Language System Metathesaurus [73.86656026386038]
We introduce UmlsBERT, a contextual embedding model that integrates domain knowledge during the pre-training process.
By applying these two strategies, UmlsBERT can encode clinical domain knowledge into word embeddings and outperform existing domain-specific models.
arXiv Detail & Related papers (2020-10-20T15:56:31Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - A Comparative Study of Lexical Substitution Approaches based on Neural
Language Models [117.96628873753123]
We present a large-scale comparative study of popular neural language and masked language models.
We show that already competitive results achieved by SOTA LMs/MLMs can be further improved if information about the target word is injected properly.
arXiv Detail & Related papers (2020-05-29T18:43:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.