DiSC: Differential Spectral Clustering of Features
- URL: http://arxiv.org/abs/2211.05314v1
- Date: Thu, 10 Nov 2022 03:32:17 GMT
- Title: DiSC: Differential Spectral Clustering of Features
- Authors: Ram Dyuthi Sristi, Gal Mishne, Ariel Jaffe
- Abstract summary: We develop a data-driven approach for detecting groups of features that differentiate between conditions.
We compute subsets of nodes whose connectivity differs significantly between condition-specific feature graphs.
- Score: 7.111650988432555
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Selecting subsets of features that differentiate between two conditions is a
key task in a broad range of scientific domains. In many applications, the
features of interest form clusters with similar effects on the data at hand. To
recover such clusters we develop DiSC, a data-driven approach for detecting
groups of features that differentiate between conditions. For each condition,
we construct a graph whose nodes correspond to the features and whose weights
are functions of the similarity between them for that condition. We then apply
a spectral approach to compute subsets of nodes whose connectivity differs
significantly between the condition-specific feature graphs. On the theoretical
front, we analyze our approach with a toy example based on the stochastic block
model. We evaluate DiSC on a variety of datasets, including MNIST,
hyperspectral imaging, simulated scRNA-seq and task fMRI, and demonstrate that
DiSC uncovers features that better differentiate between conditions compared to
competing methods.
Related papers
- Supervised Pattern Recognition Involving Skewed Feature Densities [49.48516314472825]
The classification potential of the Euclidean distance and a dissimilarity index based on the coincidence similarity index are compared.
The accuracy of classifying the intersection point between the densities of two adjacent groups is taken into account.
arXiv Detail & Related papers (2024-09-02T12:45:18Z) - Graph Clustering with Cross-View Feature Propagation [0.48065059125122356]
We present Graph Clustering With Cross-View Feature Propagation (GCFP), a novel method that leverages multi-view feature propagation to enhance cluster identification in graph data.
Our experiments on various real-world graphs demonstrate the superior clustering performance of GCCFP compared to well-established methods.
arXiv Detail & Related papers (2024-08-12T09:38:15Z) - Semi-Supervised Clustering via Structural Entropy with Different
Constraints [30.215985625884922]
We present Semi-supervised clustering via Structural Entropy (SSE), a novel method that can incorporate different types of constraints from diverse sources to perform both partitioning and hierarchical clustering.
We evaluate SSE on nine clustering datasets and compare it with eleven semi-supervised partitioning and hierarchical clustering methods.
arXiv Detail & Related papers (2023-12-18T04:00:40Z) - Histopathology Whole Slide Image Analysis with Heterogeneous Graph
Representation Learning [78.49090351193269]
We propose a novel graph-based framework to leverage the inter-relationships among different types of nuclei for WSI analysis.
Specifically, we formulate the WSI as a heterogeneous graph with "nucleus-type" attribute to each node and a semantic attribute similarity to each edge.
Our framework outperforms the state-of-the-art methods with considerable margins on various tasks.
arXiv Detail & Related papers (2023-07-09T14:43:40Z) - Comparison of Clustering Algorithms for Statistical Features of
Vibration Data Sets [0.4806505912512235]
We present an extensive comparison of the clustering algorithms K-means clustering, OPTICS, and Gaussian mixture model clustering (GMM) applied to statistical features extracted from the time and frequency domains of vibration data sets.
Our work showed that averaging (Mean, Median) and variance-based features (Standard Deviation, Interquartile Range) performed significantly better than shape-based features (Skewness, Kurtosis)
With an increase in the specified number of clusters, clustering algorithms performed better, although there were some specific algorithmic restrictions.
arXiv Detail & Related papers (2023-05-11T12:19:30Z) - Perfect Spectral Clustering with Discrete Covariates [68.8204255655161]
We propose a spectral algorithm that achieves perfect clustering with high probability on a class of large, sparse networks.
Our method is the first to offer a guarantee of consistent latent structure recovery using spectral clustering.
arXiv Detail & Related papers (2022-05-17T01:41:06Z) - The role of feature space in atomistic learning [62.997667081978825]
Physically-inspired descriptors play a key role in the application of machine-learning techniques to atomistic simulations.
We introduce a framework to compare different sets of descriptors, and different ways of transforming them by means of metrics and kernels.
We compare representations built in terms of n-body correlations of the atom density, quantitatively assessing the information loss associated with the use of low-order features.
arXiv Detail & Related papers (2020-09-06T14:12:09Z) - Interpretable Visualizations with Differentiating Embedding Networks [0.0]
We present a visualization algorithm based on a novel unsupervised Siamese neural network training regime and loss function, called Differentiating Embedding Networks (DEN)
The Siamese neural network finds differentiating or similar features between specific pairs of samples in a dataset, and uses these features to embed the dataset in a lower dimensional space where it can be visualized.
To interpret DEN, we create an end-to-end parametric clustering algorithm on top of the visualization, and then leverage SHAP scores to determine which features in the sample space are important for understanding the structures shown in the visualization based on the clusters found.
arXiv Detail & Related papers (2020-06-11T17:30:44Z) - Bayesian Sparse Factor Analysis with Kernelized Observations [67.60224656603823]
Multi-view problems can be faced with latent variable models.
High-dimensionality and non-linear issues are traditionally handled by kernel methods.
We propose merging both approaches into single model.
arXiv Detail & Related papers (2020-06-01T14:25:38Z) - Stable and consistent density-based clustering via multiparameter
persistence [77.34726150561087]
We consider the degree-Rips construction from topological data analysis.
We analyze its stability to perturbations of the input data using the correspondence-interleaving distance.
We integrate these methods into a pipeline for density-based clustering, which we call Persistable.
arXiv Detail & Related papers (2020-05-18T19:45:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.