The Cellwise Minimum Covariance Determinant Estimator
- URL: http://arxiv.org/abs/2207.13493v2
- Date: Wed, 15 Nov 2023 11:04:00 GMT
- Title: The Cellwise Minimum Covariance Determinant Estimator
- Authors: Jakob Raymaekers and Peter J. Rousseeuw
- Abstract summary: We propose a cellwise robust version of the MCD method, called cellMCD.
It performs well in simulations with cellwise outliers, and has high finite-sample efficiency on clean data.
It is illustrated with real data with visualizations of the results.
- Score: 1.90365714903665
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The usual Minimum Covariance Determinant (MCD) estimator of a covariance
matrix is robust against casewise outliers. These are cases (that is, rows of
the data matrix) that behave differently from the majority of cases, raising
suspicion that they might belong to a different population. On the other hand,
cellwise outliers are individual cells in the data matrix. When a row contains
one or more outlying cells, the other cells in the same row still contain
useful information that we wish to preserve. We propose a cellwise robust
version of the MCD method, called cellMCD. Its main building blocks are
observed likelihood and a penalty term on the number of flagged cellwise
outliers. It possesses good breakdown properties. We construct a fast algorithm
for cellMCD based on concentration steps (C-steps) that always lower the
objective. The method performs well in simulations with cellwise outliers, and
has high finite-sample efficiency on clean data. It is illustrated on real data
with visualizations of the results.
Related papers
- Lower-dimensional projections of cellular expression improves cell type classification from single-cell RNA sequencing [12.66369956714212]
Single-cell RNA sequencing (scRNA-seq) enables the study of cellular diversity at single cell level.
Various statistical, machine and deep learning-based methods have been proposed for cell-type classification.
In this work, we proposed a reference-based method for cell type classification, called EnProCell.
arXiv Detail & Related papers (2024-10-13T19:01:38Z) - Single-cell Multi-view Clustering via Community Detection with Unknown
Number of Clusters [64.31109141089598]
We introduce scUNC, an innovative multi-view clustering approach tailored for single-cell data.
scUNC seamlessly integrates information from different views without the need for a predefined number of clusters.
We conducted a comprehensive evaluation of scUNC using three distinct single-cell datasets.
arXiv Detail & Related papers (2023-11-28T08:34:58Z) - Test Set Sizing Via Random Matrix Theory [91.3755431537592]
This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression.
It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise.
This paper is the first to solve for the training and test size for any model in a way that is truly optimal.
arXiv Detail & Related papers (2021-12-11T13:18:33Z) - Contrastive Cycle Adversarial Autoencoders for Single-cell Multi-omics
Alignment and Integration [0.0]
We propose a novel framework to align and integrate single-cell RNA-seq data and single-cell ATAC-seq data.
Compared with the other state-of-the-art methods, our method performs better in both simulated and real single-cell data.
arXiv Detail & Related papers (2021-12-05T13:00:58Z) - Solving weakly supervised regression problem using low-rank manifold
regularization [77.34726150561087]
We solve a weakly supervised regression problem.
Under "weakly" we understand that for some training points the labels are known, for some unknown, and for others uncertain due to the presence of random noise or other reasons such as lack of resources.
In the numerical section, we applied the suggested method to artificial and real datasets using Monte-Carlo modeling.
arXiv Detail & Related papers (2021-04-13T23:21:01Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Classification Beats Regression: Counting of Cells from Greyscale
Microscopic Images based on Annotation-free Training Samples [20.91256120719461]
This work proposes a supervised learning framework to count cells from greyscale microscopic images without using annotated training images.
We formulate the cell counting task as an image classification problem, where the cell counts are taken as class labels.
To deal with these limitations, we propose a simple but effective data augmentation (DA) method to synthesize images for the unseen cell counts.
arXiv Detail & Related papers (2020-10-28T06:19:30Z) - Outlier detection in non-elliptical data by kernel MRCD [10.69910379275607]
The Kernel Minimum Regularized Covariance Determinant (KMRCD) estimator is proposed.
It implicitly computes the MRCD estimates in a kernel induced feature space.
A fast algorithm is constructed that starts from kernel-based initial estimates and exploits the kernel trick to speed up the subsequent computations.
arXiv Detail & Related papers (2020-08-05T11:09:08Z) - Split and Expand: An inference-time improvement for Weakly Supervised
Cell Instance Segmentation [71.50526869670716]
We propose a two-step post-processing procedure, Split and Expand, to improve the conversion of segmentation maps to instances.
In the Split step, we split clumps of cells from the segmentation map into individual cell instances with the guidance of cell-center predictions.
In the Expand step, we find missing small cells using the cell-center predictions.
arXiv Detail & Related papers (2020-07-21T14:05:09Z) - Improved guarantees and a multiple-descent curve for Column Subset
Selection and the Nystr\"om method [76.73096213472897]
We develop techniques which exploit spectral properties of the data matrix to obtain improved approximation guarantees.
Our approach leads to significantly better bounds for datasets with known rates of singular value decay.
We show that both our improved bounds and the multiple-descent curve can be observed on real datasets simply by varying the RBF parameter.
arXiv Detail & Related papers (2020-02-21T00:43:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.