Improving ICD-based semantic similarity by accounting for varying
degrees of comorbidity
- URL: http://arxiv.org/abs/2308.07359v1
- Date: Mon, 14 Aug 2023 14:56:07 GMT
- Title: Improving ICD-based semantic similarity by accounting for varying
degrees of comorbidity
- Authors: Jan Janosch Schneider and Marius Adler and Christoph Ammer-Herrmenau
and Alexander Otto K\"onig and Ulrich Sax and Jonas H\"ugel
- Abstract summary: International Statistical Classification of Diseases and Related Health Problems (ICD) codes are used worldwide to encode diseases.
It is possible to compute the similarity of patients based on their ICD codes by using semantic similarity algorithms.
We compare the performance of 80 combinations of established algorithms in terms of semantic similarity based on ICD-code sets.
- Score: 39.58317527488534
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Finding similar patients is a common objective in precision medicine,
facilitating treatment outcome assessment and clinical decision support.
Choosing widely-available patient features and appropriate mathematical methods
for similarity calculations is crucial. International Statistical
Classification of Diseases and Related Health Problems (ICD) codes are used
worldwide to encode diseases and are available for nearly all patients.
Aggregated as sets consisting of primary and secondary diagnoses they can
display a degree of comorbidity and reveal comorbidity patterns. It is possible
to compute the similarity of patients based on their ICD codes by using
semantic similarity algorithms. These algorithms have been traditionally
evaluated using a single-term expert rated data set.
However, real-word patient data often display varying degrees of documented
comorbidities that might impair algorithm performance. To account for this, we
present a scale term that considers documented comorbidity-variance. In this
work, we compared the performance of 80 combinations of established algorithms
in terms of semantic similarity based on ICD-code sets. The sets have been
extracted from patients with a C25.X (pancreatic cancer) primary diagnosis and
provide a variety of different combinations of ICD-codes. Using our scale term
we yielded the best results with a combination of level-based information
content, Leacock & Chodorow concept similarity and bipartite graph matching for
the set similarities reaching a correlation of 0.75 with our expert's ground
truth. Our results highlight the importance of accounting for comorbidity
variance while demonstrating how well current semantic similarity algorithms
perform.
Related papers
- Bridging the Diagnostic Divide: Classical Computer Vision and Advanced AI methods for distinguishing ITB and CD through CTE Scans [2.900410045439515]
A consensus among radiologists has recognized the visceral-to-subcutaneous fat ratio as a surrogate biomarker for differentiating between ITB and CD.
We propose a novel 2D image computer vision algorithm for auto-segmenting subcutaneous fat to automate this ratio calculation.
We trained a ResNet10 model on a dataset of CTE scans with samples from ITB, CD, and normal patients, achieving an accuracy of 75%.
arXiv Detail & Related papers (2024-10-23T17:05:27Z) - A Robust Ensemble Algorithm for Ischemic Stroke Lesion Segmentation: Generalizability and Clinical Utility Beyond the ISLES Challenge [30.611482996378683]
Image and disease variability hinder the development of generalizable AI algorithms with clinical value.
We present a novel ensemble algorithm derived from the 2022 Ischemic Stroke Lesion (ISLES) challenge.
We combined top-performing algorithms into an ensemble model that overcomes the limitations of individual solutions.
arXiv Detail & Related papers (2024-03-28T13:56:26Z) - Multi-Dimensional Ability Diagnosis for Machine Learning Algorithms [88.93372675846123]
We propose a task-agnostic evaluation framework Camilla for evaluating machine learning algorithms.
We use cognitive diagnosis assumptions and neural networks to learn the complex interactions among algorithms, samples and the skills of each sample.
In our experiments, Camilla outperforms state-of-the-art baselines on the metric reliability, rank consistency and rank stability.
arXiv Detail & Related papers (2023-07-14T03:15:56Z) - A method for comparing multiple imputation techniques: a case study on
the U.S. National COVID Cohort Collaborative [1.259457977936316]
We numerically evaluate strategies for handling missing data in the context of statistical analysis.
Our approach could effectively highlight the most valid and performant missing-data handling strategy.
arXiv Detail & Related papers (2022-06-13T19:49:54Z) - Coronary Heart Disease Diagnosis Based on Improved Ensemble Learning [0.0]
This study is to develop heart disease diagnosis method based on ensemble learning and cascade generalization.
C4. 5 and RIPPER algorithm were used as meta-level algorithm and Naive Bayes was used as baselevel algorithm.
arXiv Detail & Related papers (2020-07-06T17:14:30Z) - COMPOSE: Cross-Modal Pseudo-Siamese Network for Patient Trial Matching [70.08786840301435]
We propose CrOss-Modal PseudO-SiamEse network (COMPOSE) to address these challenges for patient-trial matching.
Experiment results show COMPOSE can reach 98.0% AUC on patient-criteria matching and 83.7% accuracy on patient-trial matching.
arXiv Detail & Related papers (2020-06-15T21:01:33Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z) - Predictive Modeling of ICU Healthcare-Associated Infections from
Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling
Approach [55.41644538483948]
This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units.
The aim is to support decision making addressed at reducing the incidence rate of infections.
arXiv Detail & Related papers (2020-05-07T16:13:12Z) - Deep Representation Learning of Electronic Health Records to Unlock
Patient Stratification at Scale [0.5498849973527224]
We present an unsupervised framework based on deep learning to process heterogeneous EHRs.
We derive patient representations that can efficiently and effectively enable patient stratification at scale.
arXiv Detail & Related papers (2020-03-14T00:04:20Z) - VerSe: A Vertebrae Labelling and Segmentation Benchmark for
Multi-detector CT Images [121.31355003451152]
Large Scale Vertebrae Challenge (VerSe) was organised in conjunction with the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) in 2019 and 2020.
We present the the results of this evaluation and further investigate the performance-variation at vertebra-level, scan-level, and at different fields-of-view.
arXiv Detail & Related papers (2020-01-24T21:09:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.