Quantification of BERT Diagnosis Generalizability Across Medical
Specialties Using Semantic Dataset Distance
- URL: http://arxiv.org/abs/2008.06606v3
- Date: Fri, 19 Feb 2021 06:58:18 GMT
- Title: Quantification of BERT Diagnosis Generalizability Across Medical
Specialties Using Semantic Dataset Distance
- Authors: Mihir P. Khambete, William Su, Juan Garcia, Marcus A. Badgeley
- Abstract summary: Deep learning models in healthcare may fail to generalize on data from unseen corpora.
No metric exists to tell how existing models will perform on new data.
Model performance on new corpora is directly correlated to the similarity between train and test sentence content.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning models in healthcare may fail to generalize on data from unseen
corpora. Additionally, no quantitative metric exists to tell how existing
models will perform on new data. Previous studies demonstrated that NLP models
of medical notes generalize variably between institutions, but ignored other
levels of healthcare organization. We measured SciBERT diagnosis sentiment
classifier generalizability between medical specialties using EHR sentences
from MIMIC-III. Models trained on one specialty performed better on internal
test sets than mixed or external test sets (mean AUCs 0.92, 0.87, and 0.83,
respectively; p = 0.016). When models are trained on more specialties, they
have better test performances (p < 1e-4). Model performance on new corpora is
directly correlated to the similarity between train and test sentence content
(p < 1e-4). Future studies should assess additional axes of generalization to
ensure deep learning models fulfil their intended purpose across institutions,
specialties, and practices.
Related papers
- Reducing Biases towards Minoritized Populations in Medical Curricular Content via Artificial Intelligence for Fairer Health Outcomes [8.976475688579221]
We introduce BRICC, a firstin-class initiative that seeks to mitigate medical bisinformation using machine learning.
A gold-standard BRICC dataset was developed throughout several years, and contains over 12K pages of instructional materials.
Medical experts meticulously annotated these documents for bias according to comprehensive coding guidelines.
arXiv Detail & Related papers (2024-05-21T04:11:18Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - Federated Learning of Medical Concepts Embedding using BEHRT [0.0]
We propose a federated learning approach for learning medical concepts embedding.
Our approach is based on embedding model like BEHRT, a deep neural sequence model for EHR.
We compare the performance of a model trained with FL against a model trained on centralized data.
arXiv Detail & Related papers (2023-05-22T14:05:39Z) - Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review
and Replicability Study [60.56194508762205]
We reproduce, compare, and analyze state-of-the-art automated medical coding machine learning models.
We show that several models underperform due to weak configurations, poorly sampled train-test splits, and insufficient evaluation.
We present the first comprehensive results on the newly released MIMIC-IV dataset using the reproduced models.
arXiv Detail & Related papers (2023-04-21T11:54:44Z) - A Cross-institutional Evaluation on Breast Cancer Phenotyping NLP
Algorithms on Electronic Health Records [19.824923994227202]
We developed three types of NLP models to extract cancer phenotypes from clinical texts.
The models were evaluated for their generalizability on different test sets with different learning strategies.
The CancerBERT model developed in one institute and further fine-tuned in another institute achieved reasonable performance.
arXiv Detail & Related papers (2023-03-15T08:44:07Z) - Clinical Deterioration Prediction in Brazilian Hospitals Based on
Artificial Neural Networks and Tree Decision Models [56.93322937189087]
An extremely boosted neural network (XBNet) is used to predict clinical deterioration (CD)
The XGBoost model obtained the best results in predicting CD among Brazilian hospitals' data.
arXiv Detail & Related papers (2022-12-17T23:29:14Z) - Multi-task fusion for improving mammography screening data
classification [3.7683182861690843]
We propose a pipeline approach, where we first train a set of individual, task-specific models.
We then investigate the fusion thereof, which is in contrast to the standard model ensembling strategy.
Our fusion approaches improve AUC scores significantly by up to 0.04 compared to standard model ensembling.
arXiv Detail & Related papers (2021-12-01T13:56:27Z) - MIMO: Mutual Integration of Patient Journey and Medical Ontology for
Healthcare Representation Learning [49.57261599776167]
We propose an end-to-end robust Transformer-based solution, Mutual Integration of patient journey and Medical Ontology (MIMO) for healthcare representation learning and predictive analytics.
arXiv Detail & Related papers (2021-07-20T07:04:52Z) - Pre-training transformer-based framework on large-scale pediatric claims
data for downstream population-specific tasks [3.1580072841682734]
This study presents the Claim Pre-Training (Claim-PT) framework, a generic pre-training model that first trains on the entire pediatric claims dataset.
The effective knowledge transfer is completed through the task-aware fine-tuning stage.
We conducted experiments on a real-world claims dataset with more than one million patient records.
arXiv Detail & Related papers (2021-06-24T15:25:41Z) - Adversarial Sample Enhanced Domain Adaptation: A Case Study on
Predictive Modeling with Electronic Health Records [57.75125067744978]
We propose a data augmentation method to facilitate domain adaptation.
adversarially generated samples are used during domain adaptation.
Results confirm the effectiveness of our method and the generality on different tasks.
arXiv Detail & Related papers (2021-01-13T03:20:20Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.