Cross-Cluster Weighted Forests
- URL: http://arxiv.org/abs/2105.07610v3
- Date: Tue, 29 Oct 2024 02:51:27 GMT
- Title: Cross-Cluster Weighted Forests
- Authors: Maya Ramchandran, Rajarshi Mukherjee, Giovanni Parmigiani,
- Abstract summary: This article considers the effect of ensembling Random Forest learners trained on clusters within a single dataset with heterogeneity in the distribution of the features.
We find that constructing ensembles of forests trained on clusters determined by algorithms such as k-means results in significant improvements in accuracy and generalizability over the traditional Random Forest algorithm.
- Score: 4.9873153106566575
- License:
- Abstract: Adapting machine learning algorithms to better handle the presence of clusters or batch effects within training datasets is important across a wide variety of biological applications. This article considers the effect of ensembling Random Forest learners trained on clusters within a single dataset with heterogeneity in the distribution of the features. We find that constructing ensembles of forests trained on clusters determined by algorithms such as k-means results in significant improvements in accuracy and generalizability over the traditional Random Forest algorithm. We begin with a theoretical exploration of the benefits of our novel approach, denoted as the Cross-Cluster Weighted Forest, and subsequently empirically examine its robustness to various data-generating scenarios and outcome models. Furthermore, we explore the influence of the data-partitioning and ensemble weighting strategies the benefits of our method over the existing paradigm. Finally, we apply our approach to cancer molecular profiling and gene expression datasets that are naturally divisible into clusters and illustrate that our approach outperforms the classic Random Forest. The code and supplementary material are available at https://github.com/m-ramchandran/cross-cluster.
Related papers
- Dynamic Post-Hoc Neural Ensemblers [55.15643209328513]
In this study, we explore employing neural networks as ensemble methods.
Motivated by the risk of learning low-diversity ensembles, we propose regularizing the model by randomly dropping base model predictions.
We demonstrate this approach lower bounds the diversity within the ensemble, reducing overfitting and improving generalization capabilities.
arXiv Detail & Related papers (2024-10-06T15:25:39Z) - Enabling Mixed Effects Neural Networks for Diverse, Clustered Data Using Monte Carlo Methods [9.035959289139102]
Mixed effects neural networks (MENNs) separate cluster-specific 'random effects' from cluster-invariant 'fixed effects'
We present MC-GMENN, a novel approach employing Monte Carlo methods to train Generalized Mixed Effects Neural Networks.
arXiv Detail & Related papers (2024-07-01T09:24:04Z) - GCC: Generative Calibration Clustering [55.44944397168619]
We propose a novel Generative Clustering (GCC) method to incorporate feature learning and augmentation into clustering procedure.
First, we develop a discrimirative feature alignment mechanism to discover intrinsic relationship across real and generated samples.
Second, we design a self-supervised metric learning to generate more reliable cluster assignment.
arXiv Detail & Related papers (2024-04-14T01:51:11Z) - Federated unsupervised random forest for privacy-preserving patient
stratification [0.4499833362998487]
We introduce a novel multi-omics clustering approach utilizing unsupervised random-forests.
We have validated our approach on machine learning benchmark data sets and on cancer data from The Cancer Genome Atlas.
Our method is competitive with the state-of-the-art in terms of disease subtyping, but at the same time substantially improves the cluster interpretability.
arXiv Detail & Related papers (2024-01-29T12:04:14Z) - Improving Link Prediction in Social Networks Using Local and Global
Features: A Clustering-based Approach [0.0]
We propose an approach based on the combination of first and second group methods to tackle the link prediction problem.
Our two-phase developed method firstly determines new features related to the position and dynamic behavior of nodes.
Then, a subspace clustering algorithm is applied to group social objects based on the computed similarity measures.
arXiv Detail & Related papers (2023-05-17T14:45:02Z) - Unified Multi-View Orthonormal Non-Negative Graph Based Clustering
Framework [74.25493157757943]
We formulate a novel clustering model, which exploits the non-negative feature property and incorporates the multi-view information into a unified joint learning framework.
We also explore, for the first time, the multi-model non-negative graph-based approach to clustering data based on deep features.
arXiv Detail & Related papers (2022-11-03T08:18:27Z) - Deep Clustering: A Comprehensive Survey [53.387957674512585]
Clustering analysis plays an indispensable role in machine learning and data mining.
Deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks.
Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering.
arXiv Detail & Related papers (2022-10-09T02:31:32Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Unsupervised Multi-view Clustering by Squeezing Hybrid Knowledge from
Cross View and Each View [68.88732535086338]
This paper proposes a new multi-view clustering method, low-rank subspace multi-view clustering based on adaptive graph regularization.
Experimental results for five widely used multi-view benchmarks show that our proposed algorithm surpasses other state-of-the-art methods by a clear margin.
arXiv Detail & Related papers (2020-08-23T08:25:06Z) - Siloed Federated Learning for Multi-Centric Histopathology Datasets [0.17842332554022694]
This paper proposes a novel federated learning approach for deep learning architectures in the medical domain.
Local-statistic batch normalization (BN) layers are introduced, resulting in collaboratively-trained, yet center-specific models.
We benchmark the proposed method on the classification of tumorous histopathology image patches extracted from the Camelyon16 and Camelyon17 datasets.
arXiv Detail & Related papers (2020-08-17T15:49:30Z) - Elastic Coupled Co-clustering for Single-Cell Genomic Data [0.0]
Single-cell technologies have enabled us to profile genomic features at unprecedented resolution.
Data integration can potentially lead to a better performance of clustering algorithms.
In this work, we formulate the problem in an unsupervised transfer learning framework.
arXiv Detail & Related papers (2020-03-29T08:21:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.