Tk-merge: Computationally Efficient Robust Clustering Under General
Assumptions
- URL: http://arxiv.org/abs/2201.06391v1
- Date: Mon, 17 Jan 2022 13:05:05 GMT
- Title: Tk-merge: Computationally Efficient Robust Clustering Under General
Assumptions
- Authors: Luca Insolia and Domenico Perrotta
- Abstract summary: We present a two-step hybrid robust clustering algorithm based on trimmed k-means and hierarchical agglomeration.
We also present natural generalizations of the approach as well as an adaptive procedure to estimate the amount of contamination in a data-driven fashion.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address general-shaped clustering problems under very weak parametric
assumptions with a two-step hybrid robust clustering algorithm based on trimmed
k-means and hierarchical agglomeration. The algorithm has low computational
complexity and effectively identifies the clusters also in presence of data
contamination. We also present natural generalizations of the approach as well
as an adaptive procedure to estimate the amount of contamination in a
data-driven fashion. Our proposal outperforms state-of-the-art robust,
model-based methods in our numerical simulations and real-world applications
related to color quantization for image analysis, human mobility patterns based
on GPS data, biomedical images of diabetic retinopathy, and functional data
across weather stations.
Related papers
- Hierarchical and Density-based Causal Clustering [6.082022112101251]
We propose plug-in estimators that are simple and readily implementable using off-the-shelf algorithms.
We go on to study their rate of convergence, and show that the additional cost of causal clustering is essentially the estimation error of the outcome regression functions.
arXiv Detail & Related papers (2024-11-02T14:01:04Z) - Simple and Scalable Algorithms for Cluster-Aware Precision Medicine [0.0]
We propose a simple and scalable approach to joint clustering and embedding.
This novel, cluster-aware embedding approach overcomes the complexity and limitations of current joint embedding and clustering methods.
Our approach does not require the user to choose the desired number of clusters, but instead yields interpretable dendrograms of hierarchically clustered embeddings.
arXiv Detail & Related papers (2022-11-29T19:27:26Z) - RandomSCM: interpretable ensembles of sparse classifiers tailored for
omics data [59.4141628321618]
We propose an ensemble learning algorithm based on conjunctions or disjunctions of decision rules.
The interpretability of the models makes them useful for biomarker discovery and patterns discovery in high dimensional data.
arXiv Detail & Related papers (2022-08-11T13:55:04Z) - Orthogonalization of data via Gromov-Wasserstein type feedback for
clustering and visualization [5.44192123671277]
We propose an adaptive approach for clustering and visualization of data by an orthogonalization process.
We prove that the method converges globally to a unique fixpoint for certain parameter values.
We confirm that the method produces biologically meaningful clustering results consistent with human expert classification.
arXiv Detail & Related papers (2022-07-25T15:52:11Z) - Multiway Spherical Clustering via Degree-Corrected Tensor Block Models [8.147652597876862]
We develop a degree-corrected block model with estimation accuracy guarantees.
In particular, we demonstrate that an intrinsic statistical-to-computational gap emerges only for tensors of order three or greater.
The efficacy of our procedure is demonstrated through two data applications.
arXiv Detail & Related papers (2022-01-19T03:40:22Z) - Scalable Intervention Target Estimation in Linear Models [52.60799340056917]
Current approaches to causal structure learning either work with known intervention targets or use hypothesis testing to discover the unknown intervention targets.
This paper proposes a scalable and efficient algorithm that consistently identifies all intervention targets.
The proposed algorithm can be used to also update a given observational Markov equivalence class into the interventional Markov equivalence class.
arXiv Detail & Related papers (2021-11-15T03:16:56Z) - MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms [82.90843777097606]
We propose a causally-aware imputation algorithm (MIRACLE) for missing data.
MIRACLE iteratively refines the imputation of a baseline by simultaneously modeling the missingness generating mechanism.
We conduct extensive experiments on synthetic and a variety of publicly available datasets to show that MIRACLE is able to consistently improve imputation.
arXiv Detail & Related papers (2021-11-04T22:38:18Z) - Riemannian classification of EEG signals with missing values [67.90148548467762]
This paper proposes two strategies to handle missing data for the classification of electroencephalograms.
The first approach estimates the covariance from imputed data with the $k$-nearest neighbors algorithm; the second relies on the observed data by leveraging the observed-data likelihood within an expectation-maximization algorithm.
As results show, the proposed strategies perform better than the classification based on observed data and allow to keep a high accuracy even when the missing data ratio increases.
arXiv Detail & Related papers (2021-10-19T14:24:50Z) - Data-driven generation of plausible tissue geometries for realistic
photoacoustic image synthesis [53.65837038435433]
Photoacoustic tomography (PAT) has the potential to recover morphological and functional tissue properties.
We propose a novel approach to PAT data simulation, which we refer to as "learning to simulate"
We leverage the concept of Generative Adversarial Networks (GANs) trained on semantically annotated medical imaging data to generate plausible tissue geometries.
arXiv Detail & Related papers (2021-03-29T11:30:18Z) - Exact Clustering in Tensor Block Model: Statistical Optimality and
Computational Limit [10.8145995157397]
High-order clustering aims to identify heterogeneous substructure in multiway dataset.
Non- computation and nature of the problem poses significant challenges in both statistics and statistics.
arXiv Detail & Related papers (2020-12-18T00:48:27Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.