Supervised Convex Clustering
- URL: http://arxiv.org/abs/2005.12198v1
- Date: Mon, 25 May 2020 16:12:38 GMT
- Title: Supervised Convex Clustering
- Authors: Minjie Wang, Tianyi Yao, Genevera I. Allen
- Abstract summary: We propose and develop a new statistical pattern discovery method named Supervised Convex Clustering ( SCC)
SCC borrows strength from both information sources and guides towards finding more interpretable patterns via a joint convex fusion penalty.
We demonstrate the practical advantages of SCC through simulations and a case study on Alzheimer's Disease genomics.
- Score: 1.4610038284393165
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clustering has long been a popular unsupervised learning approach to identify
groups of similar objects and discover patterns from unlabeled data in many
applications. Yet, coming up with meaningful interpretations of the estimated
clusters has often been challenging precisely due to its unsupervised nature.
Meanwhile, in many real-world scenarios, there are some noisy supervising
auxiliary variables, for instance, subjective diagnostic opinions, that are
related to the observed heterogeneity of the unlabeled data. By leveraging
information from both supervising auxiliary variables and unlabeled data, we
seek to uncover more scientifically interpretable group structures that may be
hidden by completely unsupervised analyses. In this work, we propose and
develop a new statistical pattern discovery method named Supervised Convex
Clustering (SCC) that borrows strength from both information sources and guides
towards finding more interpretable patterns via a joint convex fusion penalty.
We develop several extensions of SCC to integrate different types of
supervising auxiliary variables, to adjust for additional covariates, and to
find biclusters. We demonstrate the practical advantages of SCC through
simulations and a case study on Alzheimer's Disease genomics. Specifically, we
discover new candidate genes as well as new subtypes of Alzheimer's Disease
that can potentially lead to better understanding of the underlying genetic
mechanisms responsible for the observed heterogeneity of cognitive decline in
older adults.
Related papers
- Cluster Quilting: Spectral Clustering for Patchwork Learning [8.500141848121782]
We focus on the clustering problem in patchwork learning, aiming at discovering clusters amongst all samples even when some are never jointly observed for any feature.
We propose a novel spectral clustering method called Cluster Quilting, consisting of (i) patch ordering that exploits the overlapping structure amongst all patches, (ii) patchwise SVD, (iii) sequential linear mapping of top singular vectors for patch overlaps, followed by (iv) k-means on the combined and weighted singular vectors.
Under a sub-Gaussian mixture model, we establish theoretical guarantees via a non-asymptotic misclustering rate bound that reflects both
arXiv Detail & Related papers (2024-06-19T20:52:47Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Federated unsupervised random forest for privacy-preserving patient
stratification [0.4499833362998487]
We introduce a novel multi-omics clustering approach utilizing unsupervised random-forests.
We have validated our approach on machine learning benchmark data sets and on cancer data from The Cancer Genome Atlas.
Our method is competitive with the state-of-the-art in terms of disease subtyping, but at the same time substantially improves the cluster interpretability.
arXiv Detail & Related papers (2024-01-29T12:04:14Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - GADformer: A Transparent Transformer Model for Group Anomaly Detection on Trajectories [0.9971221656644376]
Group Anomaly Detection (GAD) identifies unusual pattern in groups where individual members might not be anomalous.
This paper introduces GADformer, a BERT-based model for attention-driven GAD on trajectories in unsupervised and semi-supervised settings.
We also introduce the Block-Attention-anomaly-Score (BAS) to enhance model transparency by scoring attention patterns.
arXiv Detail & Related papers (2023-03-17T08:49:09Z) - Unsupervised ensemble-based phenotyping helps enhance the
discoverability of genes related to heart morphology [57.25098075813054]
We propose a new framework for gene discovery entitled Un Phenotype Ensembles.
It builds a redundant yet highly expressive representation by pooling a set of phenotypes learned in an unsupervised manner.
These phenotypes are then analyzed via (GWAS), retaining only highly confident and stable associations.
arXiv Detail & Related papers (2023-01-07T18:36:44Z) - Effect Identification in Cluster Causal Diagrams [51.42809552422494]
We introduce a new type of graphical model called cluster causal diagrams (for short, C-DAGs)
C-DAGs allow for the partial specification of relationships among variables based on limited prior knowledge.
We develop the foundations and machinery for valid causal inferences over C-DAGs.
arXiv Detail & Related papers (2022-02-22T21:27:31Z) - A Deep Variational Approach to Clustering Survival Data [5.871238645229228]
We introduce a novel probabilistic approach to cluster survival data in a variational deep clustering setting.
Our proposed method employs a deep generative model to uncover the underlying distribution of both the explanatory variables and the potentially censored survival times.
arXiv Detail & Related papers (2021-06-10T14:10:25Z) - Algorithm-Agnostic Explainability for Unsupervised Clustering [19.375627480270627]
We present two novel algorithm-agnostic explainability methods, global permutation percent change (G2PC) feature importance and local perturbation percent change (L2PC) feature importance.
We demonstrate the utility of the methods for explaining five popular clustering algorithms on low-dimensional, ground-truth synthetic datasets.
arXiv Detail & Related papers (2021-05-17T17:58:55Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Learning from Aggregate Observations [82.44304647051243]
We study the problem of learning from aggregate observations where supervision signals are given to sets of instances.
We present a general probabilistic framework that accommodates a variety of aggregate observations.
Simple maximum likelihood solutions can be applied to various differentiable models.
arXiv Detail & Related papers (2020-04-14T06:18:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.