Cross-Lingual Knowledge Transfer for Clinical Phenotyping
- URL: http://arxiv.org/abs/2208.01912v1
- Date: Wed, 3 Aug 2022 08:33:21 GMT
- Title: Cross-Lingual Knowledge Transfer for Clinical Phenotyping
- Authors: Jens-Michalis Papaioannou, Paul Grundmann, Betty van Aken, Athanasios
Samaras, Ilias Kyparissidis, George Giannakoulas, Felix Gers, Alexander
L\"oser
- Abstract summary: We investigate cross-lingual knowledge transfer strategies to execute this task for clinics that do not use the English language.
We evaluate these strategies for a Greek and a Spanish clinic leveraging clinical notes from different clinical domains.
Our results show that using multilingual data overall improves clinical phenotyping models and can compensate for data sparseness.
- Score: 55.92262310716537
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clinical phenotyping enables the automatic extraction of clinical conditions
from patient records, which can be beneficial to doctors and clinics worldwide.
However, current state-of-the-art models are mostly applicable to clinical
notes written in English. We therefore investigate cross-lingual knowledge
transfer strategies to execute this task for clinics that do not use the
English language and have a small amount of in-domain data available. We
evaluate these strategies for a Greek and a Spanish clinic leveraging clinical
notes from different clinical domains such as cardiology, oncology and the ICU.
Our results reveal two strategies that outperform the state-of-the-art:
Translation-based methods in combination with domain-specific encoders and
cross-lingual encoders plus adapters. We find that these strategies perform
especially well for classifying rare phenotypes and we advise on which method
to prefer in which situation. Our results show that using multilingual data
overall improves clinical phenotyping models and can compensate for data
sparseness.
Related papers
- Investigating Alternative Feature Extraction Pipelines For Clinical Note
Phenotyping [0.0]
Using computational systems for the extraction of medical attributes offers many applications.
BERT-based models can be used to transform clinical notes into a series of representations.
We propose an alternative pipeline utilizing ScispaCyNeumann for extraction of common diseases.
arXiv Detail & Related papers (2023-10-05T02:51:51Z) - Knowledge Graph Embeddings for Multi-Lingual Structured Representations
of Radiology Reports [40.606143019674654]
We introduce a novel light-weight graph-based embedding method specifically catering radiology reports.
It takes into account the structure and composition of the report, while also connecting medical terms in the report.
We show the use of this embedding on two tasks namely disease classification of X-ray reports and image classification.
arXiv Detail & Related papers (2023-09-02T11:46:41Z) - sEHR-CE: Language modelling of structured EHR data for efficient and
generalizable patient cohort expansion [0.0]
sEHR-CE is a novel framework based on transformers to enable integrated phenotyping and analyses of heterogeneous clinical datasets.
We validate our approach using primary and secondary care data from the UK Biobank, a large-scale research study.
arXiv Detail & Related papers (2022-11-30T16:00:43Z) - This Patient Looks Like That Patient: Prototypical Networks for
Interpretable Diagnosis Prediction from Clinical Text [56.32427751440426]
In clinical practice such models must not only be accurate, but provide doctors with interpretable and helpful results.
We introduce ProtoPatient, a novel method based on prototypical networks and label-wise attention.
We evaluate the model on two publicly available clinical datasets and show that it outperforms existing baselines.
arXiv Detail & Related papers (2022-10-16T10:12:07Z) - Developing a general-purpose clinical language inference model from a
large corpus of clinical notes [0.30586855806896046]
We trained a Bidomain Decoder from Transformers (BERT) model using a diverse, deidentified corpus of 75 million deidentified clinical notes authored at the University of California, San Francisco (UCSF)
Our model performs at par with the best publicly available biomedical language models of comparable sizes on the public benchmark tasks, and is significantly better than these models in a within-system evaluation on the two tasks using UCSF data.
arXiv Detail & Related papers (2022-10-12T20:08:45Z) - Classifying Cyber-Risky Clinical Notes by Employing Natural Language
Processing [9.77063694539068]
Recently, some states within the United States of America require patients to have open access to their clinical notes.
This research investigates methods for identifying security/privacy risks within clinical notes.
arXiv Detail & Related papers (2022-03-24T00:36:59Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z) - Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching.
We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders.
We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z) - Clinical Named Entity Recognition using Contextualized Token
Representations [49.036805795072645]
This paper introduces the technique of contextualized word embedding to better capture the semantic meaning of each word based on its context.
We pre-train two deep contextualized language models, Clinical Embeddings from Language Model (C-ELMo) and Clinical Contextual String Embeddings (C-Flair)
Explicit experiments show that our models gain dramatic improvements compared to both static word embeddings and domain-generic language models.
arXiv Detail & Related papers (2021-06-23T18:12:58Z) - A Meta-embedding-based Ensemble Approach for ICD Coding Prediction [64.42386426730695]
International Classification of Diseases (ICD) are the de facto codes used globally for clinical coding.
These codes enable healthcare providers to claim reimbursement and facilitate efficient storage and retrieval of diagnostic information.
Our proposed approach enhances the performance of neural models by effectively training word vectors using routine medical data as well as external knowledge from scientific articles.
arXiv Detail & Related papers (2021-02-26T17:49:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.