Cluster Purging: Efficient Outlier Detection based on Rate-Distortion
Theory
- URL: http://arxiv.org/abs/2302.11234v1
- Date: Wed, 22 Feb 2023 09:32:37 GMT
- Title: Cluster Purging: Efficient Outlier Detection based on Rate-Distortion
Theory
- Authors: Maximilian B. Toller and Bernhard C. Geiger and Roman Kern
- Abstract summary: Cluster Purging is an extension of clustering-based outlier detection.
We show that Cluster Purging improves upon outliers detected from raw clusterings.
- Score: 6.929025509877642
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Rate-distortion theory-based outlier detection builds upon the rationale that
a good data compression will encode outliers with unique symbols. Based on this
rationale, we propose Cluster Purging, which is an extension of
clustering-based outlier detection. This extension allows one to assess the
representivity of clusterings, and to find data that are best represented by
individual unique clusters. We propose two efficient algorithms for performing
Cluster Purging, one being parameter-free, while the other algorithm has a
parameter that controls representivity estimations, allowing it to be tuned in
supervised setups. In an experimental evaluation, we show that Cluster Purging
improves upon outliers detected from raw clusterings, and that Cluster Purging
competes strongly against state-of-the-art alternatives.
Related papers
- GCC: Generative Calibration Clustering [55.44944397168619]
We propose a novel Generative Clustering (GCC) method to incorporate feature learning and augmentation into clustering procedure.
First, we develop a discrimirative feature alignment mechanism to discover intrinsic relationship across real and generated samples.
Second, we design a self-supervised metric learning to generate more reliable cluster assignment.
arXiv Detail & Related papers (2024-04-14T01:51:11Z) - Deep Embedding Clustering Driven by Sample Stability [16.53706617383543]
We propose a deep embedding clustering algorithm driven by sample stability (DECS)
Specifically, we start by constructing the initial feature space with an autoencoder and then learn the cluster-oriented embedding feature constrained by sample stability.
The experimental results on five datasets illustrate that the proposed method achieves superior performance compared to state-of-the-art clustering approaches.
arXiv Detail & Related papers (2024-01-29T09:19:49Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Cluster-guided Contrastive Graph Clustering Network [53.16233290797777]
We propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC)
We construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks.
To construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples.
arXiv Detail & Related papers (2023-01-03T13:42:38Z) - Robust Consensus Clustering and its Applications for Advertising
Forecasting [18.242055675730253]
We propose a novel algorithm -- robust consensus clustering that can find common ground truth among experts' opinions.
We apply the proposed method to the real-world advertising campaign segmentation and forecasting tasks.
arXiv Detail & Related papers (2022-12-27T21:49:04Z) - Self-Evolutionary Clustering [1.662966122370634]
Most existing deep clustering methods are based on simple distance comparison and highly dependent on the target distribution generated by a handcrafted nonlinear mapping.
A novel modular Self-Evolutionary Clustering (Self-EvoC) framework is constructed, which boosts the clustering performance by classification in a self-supervised manner.
The framework can efficiently discriminate sample outliers and generate better target distribution with the assistance of self-supervised.
arXiv Detail & Related papers (2022-02-21T19:38:18Z) - Deep Clustering based Fair Outlier Detection [19.601280507914325]
We propose an instance-level weighted representation learning strategy to enhance the joint deep clustering and outlier detection.
Our DCFOD method consistently achieves superior performance on both the outlier detection validity and two types of fairness notions in outlier detection.
arXiv Detail & Related papers (2021-06-09T15:12:26Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Progressive Cluster Purification for Unsupervised Feature Learning [48.87365358296371]
In unsupervised feature learning, sample specificity based methods ignore the inter-class information.
We propose a novel clustering based method, which excludes class inconsistent samples during progressive cluster formation.
Our approach, referred to as Progressive Cluster Purification (PCP), implements progressive clustering by gradually reducing the number of clusters during training.
arXiv Detail & Related papers (2020-07-06T08:11:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.