Related papers: Is it all a cluster game? -- Exploring Out-of-Distribution Detection based on Clustering in the Embedding Space

Is it all a cluster game? -- Exploring Out-of-Distribution Detection based on Clustering in the Embedding Space

URL: http://arxiv.org/abs/2203.08549v1
Date: Wed, 16 Mar 2022 11:22:23 GMT
Title: Is it all a cluster game? -- Exploring Out-of-Distribution Detection based on Clustering in the Embedding Space
Authors: Poulami Sinhamahapatra, Rajat Koner, Karsten Roscher, Stephan G\"unnemann
Abstract summary: It is essential for safety-critical applications of deep neural networks to determine when new inputs are significantly different from the training distribution. We study the structure and separation of clusters in the embedding space and find that supervised contrastive learning leads to well-separated clusters. In our analysis of different training methods, clustering strategies, distance metrics, and thresholding approaches, we observe that there is no clear winner.
Score: 7.856998585396422
License: http://creativecommons.org/licenses/by/4.0/
Abstract: It is essential for safety-critical applications of deep neural networks to determine when new inputs are significantly different from the training distribution. In this paper, we explore this out-of-distribution (OOD) detection problem for image classification using clusters of semantically similar embeddings of the training data and exploit the differences in distance relationships to these clusters between in- and out-of-distribution data. We study the structure and separation of clusters in the embedding space and find that supervised contrastive learning leads to well-separated clusters while its self-supervised counterpart fails to do so. In our extensive analysis of different training methods, clustering strategies, distance metrics, and thresholding approaches, we observe that there is no clear winner. The optimal approach depends on the model architecture and selected datasets for in- and out-of-distribution. While we could reproduce the outstanding results for contrastive training on CIFAR-10 as in-distribution data, we find standard cross-entropy paired with cosine similarity outperforms all contrastive training methods when training on CIFAR-100 instead. Cross-entropy provides competitive results as compared to expensive contrastive training methods.

Related papers

Interaction-Aware Gaussian Weighting for Clustered Federated Learning [58.92159838586751]
Federated Learning (FL) emerged as a decentralized paradigm to train models while preserving privacy. We propose a novel clustered FL method, FedGWC (Federated Gaussian Weighting Clustering), which groups clients based on their data distribution. Our experiments on benchmark datasets show that FedGWC outperforms existing FL algorithms in cluster quality and classification accuracy.
arXiv Detail & Related papers (2025-02-05T16:33:36Z)
Deep Clustering via Distribution Learning [7.437581715698929]
We provide a theoretical analysis to guide the optimization of clustering via distribution learning. We introduce a clustering-oriented distribution learning method called Monte-Carlo Marginalization for Clustering. The proposed Deep Clustering via Distribution Learning (DCDL) achieves promising results compared to state-of-the-art methods on popular datasets.
arXiv Detail & Related papers (2024-08-06T19:01:47Z)
Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice. We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z)
Reinforcement Federated Learning Method Based on Adaptive OPTICS Clustering [19.73560248813166]
This paper proposes an adaptive OPTICS clustering algorithm for federated learning. By perceiving the clustering environment as a Markov decision process, the goal is to find the best parameters of the OPTICS cluster. The reliability and practicability of this method have been verified on the experimental data, and its effec-tiveness and superiority have been proved.
arXiv Detail & Related papers (2023-06-22T13:11:19Z)
Deep Metric Learning Assisted by Intra-variance in A Semi-supervised View of Learning [0.0]
Deep metric learning aims to construct an embedding space where samples of the same class are close to each other, while samples of different classes are far away from each other. This paper designs a self-supervised generative assisted ranking framework that provides a semi-supervised view of intra-class variance learning scheme for typical supervised deep metric learning.
arXiv Detail & Related papers (2023-04-21T13:30:32Z)
Leveraging Ensembles and Self-Supervised Learning for Fully-Unsupervised Person Re-Identification and Text Authorship Attribution [77.85461690214551]
Learning from fully-unlabeled data is challenging in Multimedia Forensics problems, such as Person Re-Identification and Text Authorship Attribution. Recent self-supervised learning methods have shown to be effective when dealing with fully-unlabeled data in cases where the underlying classes have significant semantic differences. We propose a strategy to tackle Person Re-Identification and Text Authorship Attribution by enabling learning from unlabeled data even when samples from different classes are not prominently diverse.
arXiv Detail & Related papers (2022-02-07T13:08:11Z)
Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains. We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z)
Accuracy on the Line: On the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization [89.73665256847858]
We show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts. Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet. We also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS.
arXiv Detail & Related papers (2021-07-09T19:48:23Z)
Semi-supervised Contrastive Learning with Similarity Co-calibration [72.38187308270135]
We propose a novel training strategy, termed as Semi-supervised Contrastive Learning (SsCL) SsCL combines the well-known contrastive loss in self-supervised learning with the cross entropy loss in semi-supervised learning. We show that SsCL produces more discriminative representation and is beneficial to few shot learning.
arXiv Detail & Related papers (2021-05-16T09:13:56Z)
Deep Stable Learning for Out-Of-Distribution Generalization [27.437046504902938]
Approaches based on deep neural networks have achieved striking performance when testing data and training data share similar distribution. Eliminating the impact of distribution shifts between training and testing data is crucial for building performance-promising deep models. We propose to address this problem by removing the dependencies between features via learning weights for training samples.
arXiv Detail & Related papers (2021-04-16T03:54:21Z)
LSD-C: Linearly Separable Deep Clusters [145.89790963544314]
We present LSD-C, a novel method to identify clusters in an unlabeled dataset. Our method draws inspiration from recent semi-supervised learning practice and proposes to combine our clustering algorithm with self-supervised pretraining and strong data augmentation. We show that our approach significantly outperforms competitors on popular public image benchmarks including CIFAR 10/100, STL 10 and MNIST, as well as the document classification dataset Reuters 10K.
arXiv Detail & Related papers (2020-06-17T17:58:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.