Orthogonalization of data via Gromov-Wasserstein type feedback for
clustering and visualization
- URL: http://arxiv.org/abs/2207.12279v1
- Date: Mon, 25 Jul 2022 15:52:11 GMT
- Title: Orthogonalization of data via Gromov-Wasserstein type feedback for
clustering and visualization
- Authors: Martin Ryner and Johan Karlsson
- Abstract summary: We propose an adaptive approach for clustering and visualization of data by an orthogonalization process.
We prove that the method converges globally to a unique fixpoint for certain parameter values.
We confirm that the method produces biologically meaningful clustering results consistent with human expert classification.
- Score: 5.44192123671277
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this paper we propose an adaptive approach for clustering and
visualization of data by an orthogonalization process. Starting with the data
points being represented by a Markov process using the diffusion map framework,
the method adaptively increase the orthogonality of the clusters by applying a
feedback mechanism inspired by the Gromov-Wasserstein distance. This mechanism
iteratively increases the spectral gap and refines the orthogonality of the
data to achieve a clustering with high specificity. By using the diffusion map
framework and representing the relation between data points using transition
probabilities, the method is robust with respect to both the underlying
distance, noise in the data and random initialization. We prove that the method
converges globally to a unique fixpoint for certain parameter values. We also
propose a related approach where the transition probabilities in the Markov
process are required to be doubly stochastic, in which case the method
generates a minimizer to a nonconvex optimization problem. We apply the method
on cryo-electron microscopy image data from biopharmaceutical manufacturing
where we can confirm biologically relevant insights related to therapeutic
efficacy. We consider an example with morphological variations of gene
packaging and confirm that the method produces biologically meaningful
clustering results consistent with human expert classification.
Related papers
- Robust Multi-view Co-expression Network Inference [8.697303234009528]
Inferring gene co-expression networks from transcriptome data presents many challenges.
We introduce a robust method for high-dimensional graph inference from multiple independent studies.
arXiv Detail & Related papers (2024-09-30T06:30:09Z) - Spatially-Aware Diffusion Models with Cross-Attention for Global Field Reconstruction with Sparse Observations [1.371691382573869]
We develop and enhance score-based diffusion models in field reconstruction tasks.
We introduce a condition encoding approach to construct a tractable mapping mapping between observed and unobserved regions.
We demonstrate the ability of the model to capture possible reconstructions and improve the accuracy of fused results.
arXiv Detail & Related papers (2024-08-30T19:46:23Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - Data-heterogeneity-aware Mixing for Decentralized Learning [63.83913592085953]
We characterize the dependence of convergence on the relationship between the mixing weights of the graph and the data heterogeneity across nodes.
We propose a metric that quantifies the ability of a graph to mix the current gradients.
Motivated by our analysis, we propose an approach that periodically and efficiently optimize the metric.
arXiv Detail & Related papers (2022-04-13T15:54:35Z) - Approximate Bayesian Computation Based on Maxima Weighted Isolation
Kernel Mapping [0.0]
The work tries to solve the problem of a precise evaluation of a parameter for this type of model.
The application of the branching processes model to cancer cell evolution has many difficulties like high dimensionality and the rare appearance of a result of interest.
arXiv Detail & Related papers (2022-01-30T07:11:57Z) - Tk-merge: Computationally Efficient Robust Clustering Under General
Assumptions [0.0]
We present a two-step hybrid robust clustering algorithm based on trimmed k-means and hierarchical agglomeration.
We also present natural generalizations of the approach as well as an adaptive procedure to estimate the amount of contamination in a data-driven fashion.
arXiv Detail & Related papers (2022-01-17T13:05:05Z) - Scalable Intervention Target Estimation in Linear Models [52.60799340056917]
Current approaches to causal structure learning either work with known intervention targets or use hypothesis testing to discover the unknown intervention targets.
This paper proposes a scalable and efficient algorithm that consistently identifies all intervention targets.
The proposed algorithm can be used to also update a given observational Markov equivalence class into the interventional Markov equivalence class.
arXiv Detail & Related papers (2021-11-15T03:16:56Z) - Riemannian classification of EEG signals with missing values [67.90148548467762]
This paper proposes two strategies to handle missing data for the classification of electroencephalograms.
The first approach estimates the covariance from imputed data with the $k$-nearest neighbors algorithm; the second relies on the observed data by leveraging the observed-data likelihood within an expectation-maximization algorithm.
As results show, the proposed strategies perform better than the classification based on observed data and allow to keep a high accuracy even when the missing data ratio increases.
arXiv Detail & Related papers (2021-10-19T14:24:50Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.