A Goemans-Williamson type algorithm for identifying subcohorts in clinical trials
- URL: http://arxiv.org/abs/2506.10879v1
- Date: Thu, 12 Jun 2025 16:44:32 GMT
- Title: A Goemans-Williamson type algorithm for identifying subcohorts in clinical trials
- Authors: Pratik Worah,
- Abstract summary: We design an efficient algorithm that outputs a linear classifier for identifying homogeneous subsets from large inhomogeneous datasets.<n>As an application, we use our algorithm to design a simple test that can identify homogeneous subcohorts of patients.<n>We also use the test output by the algorithm to systematically identify subcohorts of patients in which statistically significant changes in methylation levels of tumor suppressor genes co-occur with statistically significant changes in nuclear receptor expression.
- Score: 1.930852251165745
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We design an efficient algorithm that outputs a linear classifier for identifying homogeneous subsets (equivalently subcohorts) from large inhomogeneous datasets. Our theoretical contribution is a rounding technique, similar to that of Goemans and Williamson (1994), that approximates the optimal solution of the underlying optimization problem within a factor of $0.82$. As an application, we use our algorithm to design a simple test that can identify homogeneous subcohorts of patients, that are mainly comprised of metastatic cases, from the RNA microarray dataset for breast cancer by Curtis et al. (2012). Furthermore, we also use the test output by the algorithm to systematically identify subcohorts of patients in which statistically significant changes in methylation levels of tumor suppressor genes co-occur with statistically significant changes in nuclear receptor expression. Identifying such homogeneous subcohorts of patients can be useful for the discovery of disease pathways and therapeutics, specific to the subcohort.
Related papers
- Unsupervised risk factor identification across cancer types and data modalities via explainable artificial intelligence [0.0]
We present a novel method for unsupervised machine learning that directly optimize for survival heterogeneity across patient clusters.<n>Our approach represents novel methodology for training any neural network architecture on any data modality to identify prognostically distinct patient groups.<n>This pan-cancer, model-agnostic approach represents a valuable advancement in clinical risk stratification.
arXiv Detail & Related papers (2025-06-15T19:11:10Z) - Improving statistical learning methods via features selection without replacement sampling and random projection [0.680740878601496]
Cancer is a genetic disease characterized by genetic and epigenetic alterations that disrupt normal gene expression.<n>High-dimensional microarray datasets pose challenges for classification models due to the "small n, large p" problem.<n>This study contributes to cancer biomarker discovery, offering a robust computational method for analyzing microarray data.
arXiv Detail & Related papers (2025-05-28T22:36:46Z) - Enhanced ECG Arrhythmia Detection Accuracy by Optimizing Divergence-Based Data Fusion [5.575308369829893]
We propose a feature-based fusion algorithm utilizing Kernel Density Estimation (KDE) and Kullback-Leibler (KL) divergence.<n>Using our in-house datasets consisting of ECG signals collected from 2000 healthy and 2000 diseased individuals, we verify our method by using the publicly available PTB-XL dataset.<n>The results demonstrate that the proposed fusion method significantly enhances feature-based classification accuracy for abnormal ECG cases in the merged datasets.
arXiv Detail & Related papers (2025-03-19T12:16:48Z) - Optimal Algorithms for the Inhomogeneous Spiked Wigner Model [89.1371983413931]
We derive an approximate message-passing algorithm (AMP) for the inhomogeneous problem.
We identify in particular the existence of a statistical-to-computational gap where known algorithms require a signal-to-noise ratio bigger than the information-theoretic threshold to perform better than random.
arXiv Detail & Related papers (2023-02-13T19:57:17Z) - Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings.
We exploit a warm-start strategy to amortize the estimation of the exact gradient.
By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Data-Driven Logistic Regression Ensembles With Applications in Genomics [0.0]
We introduce a novel approach to high-dimensional binary classification that integrates regularization with ensembling techniques.<n>In medical genomics applications, our approach identifies critical biomarkers overlooked by competing methods.
arXiv Detail & Related papers (2021-02-17T05:57:26Z) - Cancer Gene Profiling through Unsupervised Discovery [49.28556294619424]
We introduce a novel, automatic and unsupervised framework to discover low-dimensional gene biomarkers.
Our method is based on the LP-Stability algorithm, a high dimensional center-based unsupervised clustering algorithm.
Our signature reports promising results on distinguishing immune inflammatory and immune desert tumors.
arXiv Detail & Related papers (2021-02-11T09:04:45Z) - Robust Recursive Partitioning for Heterogeneous Treatment Effects with
Uncertainty Quantification [84.53697297858146]
Subgroup analysis of treatment effects plays an important role in applications from medicine to public policy to recommender systems.
Most of the current methods of subgroup analysis begin with a particular algorithm for estimating individualized treatment effects (ITE)
This paper develops a new method for subgroup analysis, R2P, that addresses all these weaknesses.
arXiv Detail & Related papers (2020-06-14T14:50:02Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z) - Optimization of Genomic Classifiers for Clinical Deployment: Evaluation
of Bayesian Optimization to Select Predictive Models of Acute Infection and
In-Hospital Mortality [0.0]
characterization of a patient's immune response by quantifying expression levels of specific genes from blood represents a potentially more timely and precise means of accomplishing both tasks.
Machine learning methods provide a platform to leverage this 'host response' for development of deployment-ready classification models.
We compare HO approaches for the development of diagnostic classifiers of acute infection and in-hospital mortality from gene expression of 29 diagnostic markers.
arXiv Detail & Related papers (2020-03-27T10:22:02Z) - Model-assisted cohort selection with bias analysis for generating
large-scale cohorts from the EHR for oncology research [1.25957368859589]
We introduce a technique called Model-Assisted Cohort Selection (MACS) with Bias Analysis.
We trained a model on 17,263 patients using term-frequency inverse-document-frequency (TF-IDF) and logistic regression.
We used a test set of 17,292 patients to measure algorithm performance and perform Bias Analysis.
arXiv Detail & Related papers (2020-01-13T22:58:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.