Related papers: Finding Outliers in Gaussian Model-Based Clustering

Finding Outliers in Gaussian Model-Based Clustering

URL: http://arxiv.org/abs/1907.01136v6
Date: Thu, 30 May 2024 16:26:06 GMT
Title: Finding Outliers in Gaussian Model-Based Clustering
Authors: Katharine M. Clark, Paul D. McNicholas,
Abstract summary: Clustering, or unsupervised classification, is a task often plagued by outliers. There is a paucity of work on handling outliers in clustering.
Score: 1.0435741631709405
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Clustering, or unsupervised classification, is a task often plagued by outliers. Yet there is a paucity of work on handling outliers in clustering. Outlier identification algorithms tend to fall into three broad categories: outlier inclusion, outlier trimming, and post hoc outlier identification methods, with the former two often requiring pre-specification of the number of outliers. The fact that sample squared Mahalanobis distance is beta-distributed is used to derive an approximate distribution for the log-likelihoods of subset finite Gaussian mixture models. An algorithm is then proposed that removes the least plausible points according to the subset log-likelihoods, which are deemed outliers, until the subset log-likelihoods adhere to the reference distribution. This results in a trimming method, called OCLUST, that inherently estimates the number of outliers.

Related papers

Towards Learnable Anchor for Deep Multi-View Clustering [49.767879678193005]
In this paper, we propose the Deep Multi-view Anchor Clustering (DMAC) model that performs clustering in linear time. With the optimal anchors, the full sample graph is calculated to derive a discriminative embedding for clustering. Experiments on several datasets demonstrate superior performance and efficiency of DMAC compared to state-of-the-art competitors.
arXiv Detail & Related papers (2025-03-16T09:38:11Z)
Fuzzy Granule Density-Based Outlier Detection with Multi-Scale Granular Balls [65.44462297594308]
Outlier detection refers to the identification of anomalous samples that deviate significantly from the distribution of normal data. Most unsupervised outlier detection methods are carefully designed to detect specified outliers. We propose a fuzzy rough sets-based multi-scale outlier detection method to identify various types of outliers.
arXiv Detail & Related papers (2025-01-06T12:35:51Z)
Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z)
A Computational Theory and Semi-Supervised Algorithm for Clustering [0.0]
A semi-supervised clustering algorithm is presented. The kernel of the clustering method is Mohammad's anomaly detection algorithm. Results are presented on synthetic and realworld data sets.
arXiv Detail & Related papers (2023-06-12T09:15:58Z)
Robust Outlier Rejection for 3D Registration with Variational Bayes [70.98659381852787]
We develop a novel variational non-local network-based outlier rejection framework for robust alignment. We propose a voting-based inlier searching strategy to cluster the high-quality hypothetical inliers for transformation estimation.
arXiv Detail & Related papers (2023-04-04T03:48:56Z)
Numerically assisted determination of local models in network scenarios [55.2480439325792]
We develop a numerical tool for finding explicit local models that reproduce a given statistical behaviour. We provide conjectures for the critical visibilities of the Greenberger-Horne-Zeilinger (GHZ) and W distributions. The developed codes and documentation are publicly available at281.com/mariofilho/localmodels.
arXiv Detail & Related papers (2023-03-17T13:24:04Z)
Robust computation of optimal transport by $\eta$-potential regularization [79.24513412588745]
Optimal transport (OT) has become a widely used tool in the machine learning field to measure the discrepancy between probability distributions. We propose regularizing OT with the beta-potential term associated with the so-called $beta$-divergence. We experimentally demonstrate that the transport matrix computed with our algorithm helps estimate a probability distribution robustly even in the presence of outliers.
arXiv Detail & Related papers (2022-12-26T18:37:28Z)
SSDBCODI: Semi-Supervised Density-Based Clustering with Outliers Detection Integrated [1.8444322599555096]
Clustering analysis is one of the critical tasks in machine learning. Due to the fact that the performance of clustering clustering can be significantly eroded by outliers, algorithms try to incorporate the process of outlier detection. We have proposed SSDBCODI, a semi-supervised detection element.
arXiv Detail & Related papers (2022-08-10T21:06:38Z)
Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning. Recent work has established lower bounds against the class of low-degree methods. We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z)
Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits. Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z)
C-AllOut: Catching & Calling Outliers by Type [10.69970450827617]
C-AllOut is a novel outlier detector that annotates outliers by type. It is parameter-free and scalable, besides working only with pairwise similarities (or distances) when it is needed.
arXiv Detail & Related papers (2021-10-13T14:25:52Z)
Revisiting Agglomerative Clustering [4.291340656866855]
A model of clusters was also adopted, involving a higher density nucleus surrounded by a transition, followed by outliers. The obtained results include the verification that many methods detect two clusters in unimodal data. The single-linkage method was found to be more resilient to false positives.
arXiv Detail & Related papers (2020-05-16T14:07:25Z)
Robust M-Estimation Based Bayesian Cluster Enumeration for Real Elliptically Symmetric Distributions [5.137336092866906]
Robustly determining optimal number of clusters in a data set is an essential factor in a wide range of applications. This article generalizes so that it can be used with any arbitrary Really Symmetric (RES) distributed mixture model. We derive a robust criterion for data sets with finite sample size, and also provide an approximation to reduce the computational cost at large sample sizes.
arXiv Detail & Related papers (2020-05-04T11:44:49Z)
A General Method for Robust Learning from Batches [56.59844655107251]
We consider a general framework of robust learning from batches, and determine the limits of both classification and distribution estimation over arbitrary, including continuous, domains. We derive the first robust computationally-efficient learning algorithms for piecewise-interval classification, and for piecewise-polynomial, monotone, log-concave, and gaussian-mixture distribution estimation.
arXiv Detail & Related papers (2020-02-25T18:53:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.