Related papers: Cross-Cluster Weighted Forests

Cross-Cluster Weighted Forests

URL: http://arxiv.org/abs/2105.07610v1
Date: Mon, 17 May 2021 04:58:29 GMT
Title: Cross-Cluster Weighted Forests
Authors: Maya Ramchandran, Rajarshi Mukherjee, and Giovanni Parmigiani
Abstract summary: This article considers the effect of ensembling Random Forest learners trained on clusters within a single dataset with heterogeneity in the distribution of the features. We find that constructing ensembles of forests trained on clusters determined by algorithms such as k-means results in significant improvements in accuracy and generalizability over the traditional Random Forest algorithm.
Score: 2.099922236065961
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Adapting machine learning algorithms to better handle the presence of natural clustering or batch effects within training datasets is imperative across a wide variety of biological applications. This article considers the effect of ensembling Random Forest learners trained on clusters within a single dataset with heterogeneity in the distribution of the features. We find that constructing ensembles of forests trained on clusters determined by algorithms such as k-means results in significant improvements in accuracy and generalizability over the traditional Random Forest algorithm. We denote our novel approach as the Cross-Cluster Weighted Forest, and examine its robustness to various data-generating scenarios and outcome models. Furthermore, we explore the influence of the data-partitioning and ensemble weighting strategies on conferring the benefits of our method over the existing paradigm. Finally, we apply our approach to cancer molecular profiling and gene expression datasets that are naturally divisible into clusters and illustrate that our approach outperforms classic Random Forest. Code and supplementary material are available at https://github.com/m-ramchandran/cross-cluster.

Related papers

Forest-Guided Clustering -- Shedding Light into the Random Forest Black Box [0.6652172511473786]
We present Forest-Guided Clustering (FGC), a model-specific explainability method that reveals both local and global structure in Random Forests by grouping instances according to shared decision paths.<n>FGC produces human-interpretable clusters aligned with the model's internal logic and computes cluster-specific and global feature importance scores to derive decision rules underlying RF predictions.<n> Applied to an AML transcriptomic dataset, FGC uncovered biologically coherent subpopulations, disentangled disease-relevant signals from confounders, and recovered known and novel gene expression patterns.
arXiv Detail & Related papers (2025-07-25T17:41:39Z)
Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning [53.527506374566485]
We propose a novel Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning cluster framework, namely AR-DBSCAN.<n>We show that AR-DBSCAN not only improves clustering accuracy by up to 144.1% and 175.3% in the NMI and ARI metrics, respectively, but also is capable of robustly finding dominant parameters.
arXiv Detail & Related papers (2025-05-07T11:37:23Z)
Variational phylogenetic inference with products over bipartitions [48.2982114295171]
We present a novel variational family based on coalescent times of a single-linkage clustering and derive a closed-form density of the resulting distribution over trees. Our method performs inference over all of tree space, it does not require any Markov chain Monte Carlo subroutines, and our variational family is differentiable.
arXiv Detail & Related papers (2025-02-21T00:06:57Z)
Dynamic Post-Hoc Neural Ensemblers [55.15643209328513]
In this study, we explore employing neural networks as ensemble methods. Motivated by the risk of learning low-diversity ensembles, we propose regularizing the model by randomly dropping base model predictions. We demonstrate this approach lower bounds the diversity within the ensemble, reducing overfitting and improving generalization capabilities.
arXiv Detail & Related papers (2024-10-06T15:25:39Z)
Enabling Mixed Effects Neural Networks for Diverse, Clustered Data Using Monte Carlo Methods [9.035959289139102]
Mixed effects neural networks (MENNs) separate cluster-specific 'random effects' from cluster-invariant 'fixed effects' We present MC-GMENN, a novel approach employing Monte Carlo methods to train Generalized Mixed Effects Neural Networks.
arXiv Detail & Related papers (2024-07-01T09:24:04Z)
Feature graphs for interpretable unsupervised tree ensembles: centrality, interaction, and application in disease subtyping [0.24578723416255746]
Feature selection assumes a pivotal role in enhancing model interpretability. The accuracy gained from aggregating decision trees comes at the expense of interpretability. The study introduces novel methods to construct feature graphs from unsupervised random forests.
arXiv Detail & Related papers (2024-04-27T12:47:37Z)
GCC: Generative Calibration Clustering [55.44944397168619]
We propose a novel Generative Clustering (GCC) method to incorporate feature learning and augmentation into clustering procedure. First, we develop a discrimirative feature alignment mechanism to discover intrinsic relationship across real and generated samples. Second, we design a self-supervised metric learning to generate more reliable cluster assignment.
arXiv Detail & Related papers (2024-04-14T01:51:11Z)
Federated unsupervised random forest for privacy-preserving patient stratification [0.4499833362998487]
We introduce a novel multi-omics clustering approach utilizing unsupervised random-forests. We have validated our approach on machine learning benchmark data sets and on cancer data from The Cancer Genome Atlas. Our method is competitive with the state-of-the-art in terms of disease subtyping, but at the same time substantially improves the cluster interpretability.
arXiv Detail & Related papers (2024-01-29T12:04:14Z)
Improving Link Prediction in Social Networks Using Local and Global Features: A Clustering-based Approach [0.0]
We propose an approach based on the combination of first and second group methods to tackle the link prediction problem. Our two-phase developed method firstly determines new features related to the position and dynamic behavior of nodes. Then, a subspace clustering algorithm is applied to group social objects based on the computed similarity measures.
arXiv Detail & Related papers (2023-05-17T14:45:02Z)
Unified Multi-View Orthonormal Non-Negative Graph Based Clustering Framework [74.25493157757943]
We formulate a novel clustering model, which exploits the non-negative feature property and incorporates the multi-view information into a unified joint learning framework. We also explore, for the first time, the multi-model non-negative graph-based approach to clustering data based on deep features.
arXiv Detail & Related papers (2022-11-03T08:18:27Z)
Deep Clustering: A Comprehensive Survey [53.387957674512585]
Clustering analysis plays an indispensable role in machine learning and data mining. Deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks. Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering.
arXiv Detail & Related papers (2022-10-09T02:31:32Z)
Learning Neural Causal Models with Active Interventions [83.44636110899742]
We introduce an active intervention-targeting mechanism which enables a quick identification of the underlying causal structure of the data-generating process. Our method significantly reduces the required number of interactions compared with random intervention targeting. We demonstrate superior performance on multiple benchmarks from simulated to real-world data.
arXiv Detail & Related papers (2021-09-06T13:10:37Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
Unsupervised Multi-view Clustering by Squeezing Hybrid Knowledge from Cross View and Each View [68.88732535086338]
This paper proposes a new multi-view clustering method, low-rank subspace multi-view clustering based on adaptive graph regularization. Experimental results for five widely used multi-view benchmarks show that our proposed algorithm surpasses other state-of-the-art methods by a clear margin.
arXiv Detail & Related papers (2020-08-23T08:25:06Z)
Siloed Federated Learning for Multi-Centric Histopathology Datasets [0.17842332554022694]
This paper proposes a novel federated learning approach for deep learning architectures in the medical domain. Local-statistic batch normalization (BN) layers are introduced, resulting in collaboratively-trained, yet center-specific models. We benchmark the proposed method on the classification of tumorous histopathology image patches extracted from the Camelyon16 and Camelyon17 datasets.
arXiv Detail & Related papers (2020-08-17T15:49:30Z)
Elastic Coupled Co-clustering for Single-Cell Genomic Data [0.0]
Single-cell technologies have enabled us to profile genomic features at unprecedented resolution. Data integration can potentially lead to a better performance of clustering algorithms. In this work, we formulate the problem in an unsupervised transfer learning framework.
arXiv Detail & Related papers (2020-03-29T08:21:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.