Machine Learning Driven Biomarker Selection for Medical Diagnosis
- URL: http://arxiv.org/abs/2405.10345v1
- Date: Thu, 16 May 2024 01:30:47 GMT
- Title: Machine Learning Driven Biomarker Selection for Medical Diagnosis
- Authors: Divyagna Bavikadi, Ayushi Agarwal, Shashank Ganta, Yunro Chung, Lusheng Song, Ji Qiu, Paulo Shakarian,
- Abstract summary: Recent advances in experimental methods have enabled researchers to collect data on thousands of analytes simultaneously.
This has led to correlational studies that associated molecular measurements with diseases such as Alzheimer's, Liver, and Gastric Cancer.
The use of thousands of biomarkers selected from the analytes is not practical for real-world medical diagnosis and is likely undesirable due to potentially formed spurious correlations.
- Score: 1.10252115875756
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in experimental methods have enabled researchers to collect data on thousands of analytes simultaneously. This has led to correlational studies that associated molecular measurements with diseases such as Alzheimer's, Liver, and Gastric Cancer. However, the use of thousands of biomarkers selected from the analytes is not practical for real-world medical diagnosis and is likely undesirable due to potentially formed spurious correlations. In this study, we evaluate 4 different methods for biomarker selection and 4 different machine learning (ML) classifiers for identifying correlations, evaluating 16 approaches in all. We found that contemporary methods outperform previously reported logistic regression in cases where 3 and 10 biomarkers are permitted. When specificity is fixed at 0.9, ML approaches produced a sensitivity of 0.240 (3 biomarkers) and 0.520 (10 biomarkers), while standard logistic regression provided a sensitivity of 0.000 (3 biomarkers) and 0.040 (10 biomarkers). We also noted that causal-based methods for biomarker selection proved to be the most performant when fewer biomarkers were permitted, while univariate feature selection was the most performant when a greater number of biomarkers were permitted.
Related papers
- Screen Them All: High-Throughput Pan-Cancer Genetic and Phenotypic Biomarker Screening from H&E Whole Slide Images [3.119559770601732]
Using AI on routine H&E slides offers a fast and economical approach to screen for multiple molecular biomarkers.
We present a high- throughput AI-based system leveraging Virchow2, a foundation model pre-trained on 3 million slides.
Unlike traditional methods that train individual models for each biomarker or cancer type, our system employs a unified model to simultaneously predict a wide range of clinically relevant molecular biomarkers.
arXiv Detail & Related papers (2024-08-18T17:44:00Z) - Biomarker based Cancer Classification using an Ensemble with Pre-trained Models [2.2436844508175224]
We propose a novel ensemble model combining pre-trained Hyperfast model, XGBoost, and LightGBM for multi-class classification tasks.
We leverage a meta-trained Hyperfast model for classifying cancer, accomplishing the highest AUC of 0.9929.
We also propose a novel ensemble model combining pre-trained Hyperfast model, XGBoost, and LightGBM for multi-class classification tasks, achieving an incremental increase in accuracy (0.9464)
arXiv Detail & Related papers (2024-06-14T14:43:59Z) - MMIL: A novel algorithm for disease associated cell type discovery [58.044870442206914]
Single-cell datasets often lack individual cell labels, making it challenging to identify cells associated with disease.
We introduce Mixture Modeling for Multiple Learning Instance (MMIL), an expectation method that enables the training and calibration of cell-level classifiers.
arXiv Detail & Related papers (2024-06-12T15:22:56Z) - BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers [48.21255861863282]
BMRetriever is a series of dense retrievers for enhancing biomedical retrieval.
BMRetriever exhibits strong parameter efficiency, with the 410M variant outperforming baselines up to 11.7 times larger.
arXiv Detail & Related papers (2024-04-29T05:40:08Z) - A marker-less human motion analysis system for motion-based biomarker
discovery in knee disorders [60.99112047564336]
The NHS has been having increased difficulty seeing all low-risk patients, this includes but not limited to suspected osteoarthritis (OA) patients.
We propose a novel method of automated biomarker identification for diagnosis of knee disorders and the monitoring of treatment progression.
arXiv Detail & Related papers (2023-04-26T16:47:42Z) - Regression-based Deep-Learning predicts molecular biomarkers from
pathology slides [40.24757332810004]
We developed and evaluated a new self-supervised attention-based weakly supervised regression method that predicts continuous biomarkers directly from images.
Using regression significantly enhances the accuracy of biomarker prediction, while also improving the interpretability of the results over classification.
Our open-source regression approach offers a promising alternative for continuous biomarker analysis in computational pathology.
arXiv Detail & Related papers (2023-04-11T11:43:51Z) - Multi-class versus One-class classifier in spontaneous speech analysis
oriented to Alzheimer Disease diagnosis [58.720142291102135]
The aim of our project is to contribute to earlier diagnosis of AD and better estimates of its severity by using automatic analysis performed through new biomarkers extracted from speech signal.
The use of information about outlier and Fractal Dimension features improves the system performance.
arXiv Detail & Related papers (2022-03-21T09:57:20Z) - Preventing dataset shift from breaking machine-learning biomarkers [0.6138671548064355]
A good biomarker is one that gives reliable detection of the corresponding condition.
Biomarkers are often extracted from a cohort that differs from the target population.
Such a mismatch, known as a dataset shift, can undermine the application of the biomarker to new individuals.
arXiv Detail & Related papers (2021-07-21T08:54:23Z) - The Future will be Different than Today: Model Evaluation Considerations
when Developing Translational Clinical Biomarker [4.549866091318765]
We present one evaluation strategy by using leave-one-study-out (LOSO) in place of conventional cross-validation (cv) methods.
To demonstrate the performance of K-fold vs LOSO cv in estimating the effect size of biomarkers, we leveraged data from clinical trials and simulation studies.
arXiv Detail & Related papers (2021-07-13T19:36:25Z) - Cancer Gene Profiling through Unsupervised Discovery [49.28556294619424]
We introduce a novel, automatic and unsupervised framework to discover low-dimensional gene biomarkers.
Our method is based on the LP-Stability algorithm, a high dimensional center-based unsupervised clustering algorithm.
Our signature reports promising results on distinguishing immune inflammatory and immune desert tumors.
arXiv Detail & Related papers (2021-02-11T09:04:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.