Scaling Up Deep Clustering Methods Beyond ImageNet-1K
- URL: http://arxiv.org/abs/2406.01203v1
- Date: Mon, 3 Jun 2024 11:13:27 GMT
- Title: Scaling Up Deep Clustering Methods Beyond ImageNet-1K
- Authors: Nikolas Adaloglou, Felix Michels, Kaspar Senft, Diana Petrusheva, Markus Kollmann,
- Abstract summary: In this work, we explore the performance of feature-based deep clustering approaches on large-scale benchmarks.
Our experimental analysis reveals that feature-based $k$-means is often unfairly evaluated on balanced datasets.
Deep clustering methods outperform $k$-means across most large-scale benchmarks.
- Score: 0.9437165725355702
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep image clustering methods are typically evaluated on small-scale balanced classification datasets while feature-based $k$-means has been applied on proprietary billion-scale datasets. In this work, we explore the performance of feature-based deep clustering approaches on large-scale benchmarks whilst disentangling the impact of the following data-related factors: i) class imbalance, ii) class granularity, iii) easy-to-recognize classes, and iv) the ability to capture multiple classes. Consequently, we develop multiple new benchmarks based on ImageNet21K. Our experimental analysis reveals that feature-based $k$-means is often unfairly evaluated on balanced datasets. However, deep clustering methods outperform $k$-means across most large-scale benchmarks. Interestingly, $k$-means underperforms on easy-to-classify benchmarks by large margins. The performance gap, however, diminishes on the highest data regimes such as ImageNet21K. Finally, we find that non-primary cluster predictions capture meaningful classes (i.e. coarser classes).
Related papers
- How to Achieve the Intended Aim of Deep Clustering Now, without Deep Learning [9.022973688786545]
Deep Embedded Clustering learns a latent representation via an autoencoder and performs clustering based on a $k$-means-like procedure.<n>This paper investigates whether the deep-learned representation has enabled DEC to overcome the known fundamental limitations of $k$-means clustering.
arXiv Detail & Related papers (2026-02-05T15:16:04Z) - Improving Long-Tailed Object Detection with Balanced Group Softmax and Metric Learning [0.0]
We tackle the problem of long-tailed 2D object detection using the LVISv1 dataset.<n>We employ a two-stage Faster R-CNN architecture and propose enhancements to the Balanced Group Softmax framework.<n>Our approach achieves a new state-of-the-art performance with a mean Average Precision (mAP) of 24.5%, surpassing the previous benchmark of 24.0%.
arXiv Detail & Related papers (2025-09-02T00:38:13Z) - Generalization Performance of Ensemble Clustering: From Theory to Algorithm [57.176040163699554]
This paper focuses on generalization error, excess risk and consistency in ensemble clustering.<n>By assigning varying weights to finite clusterings, we minimize the error between the empirical average clusterings and their expectation.<n>We instantiate our theory to develop a new ensemble clustering algorithm.
arXiv Detail & Related papers (2025-06-01T09:34:52Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - CLC: Cluster Assignment via Contrastive Representation Learning [9.631532215759256]
We propose Contrastive Learning-based Clustering (CLC), which uses contrastive learning to directly learn cluster assignment.
We achieve 53.4% accuracy on the full ImageNet dataset and outperform existing methods by large margins.
arXiv Detail & Related papers (2023-06-08T07:15:13Z) - Scalable Clustering: Large Scale Unsupervised Learning of Gaussian
Mixture Models with Outliers [5.478764356647437]
This paper introduces a provably robust clustering algorithm based on loss minimization.
It provides theoretical guarantees that the algorithm obtains high accuracy with high probability.
Experiments on real-world large-scale datasets demonstrate the effectiveness of the algorithm.
arXiv Detail & Related papers (2023-02-28T14:39:18Z) - Asymptotics for The $k$-means [0.6091702876917281]
The $k$-means is one of the most important unsupervised learning techniques in statistics and computer science.
The proposed clustering consistency is more appropriate than the previous criterion consistency for the clustering methods.
It is found that the proposed $k$-means method has lower clustering error rates and is more robust to small clusters and outliers.
arXiv Detail & Related papers (2022-11-18T03:36:58Z) - CvS: Classification via Segmentation For Small Datasets [52.821178654631254]
This paper presents CvS, a cost-effective classifier for small datasets that derives the classification labels from predicting the segmentation maps.
We evaluate the effectiveness of our framework on diverse problems showing that CvS is able to achieve much higher classification results compared to previous methods when given only a handful of examples.
arXiv Detail & Related papers (2021-10-29T18:41:15Z) - Transductive Few-Shot Learning: Clustering is All You Need? [31.21306826132773]
We investigate a general formulation for transive few-shot learning, which integrates prototype-based objectives.
We find that our method yields competitive performances, in term of accuracy and optimization, while scaling up to large problems.
Surprisingly, we find that our general model already achieve competitive performances in comparison to the state-of-the-art learning.
arXiv Detail & Related papers (2021-06-16T16:14:01Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Too Much Information Kills Information: A Clustering Perspective [6.375668163098171]
We propose a simple, but novel approach for variance-based k-clustering tasks, including in which is the widely known k-means clustering.
The proposed approach picks a sampling subset from the given dataset and makes decisions based on the data information in the subset only.
With certain assumptions, the resulting clustering is provably good to estimate the optimum of the variance-based objective with high probability.
arXiv Detail & Related papers (2020-09-16T01:54:26Z) - Structured Graph Learning for Clustering and Semi-supervised
Classification [74.35376212789132]
We propose a graph learning framework to preserve both the local and global structure of data.
Our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure.
Our model is equivalent to a combination of kernel k-means and k-means methods under certain condition.
arXiv Detail & Related papers (2020-08-31T08:41:20Z) - The Devil is in Classification: A Simple Framework for Long-tail Object
Detection and Instance Segmentation [93.17367076148348]
We investigate performance drop of the state-of-the-art two-stage instance segmentation model Mask R-CNN on the recent long-tail LVIS dataset.
We unveil that a major cause is the inaccurate classification of object proposals.
We propose a simple calibration framework to more effectively alleviate classification head bias with a bi-level class balanced sampling approach.
arXiv Detail & Related papers (2020-07-23T12:49:07Z) - SCAN: Learning to Classify Images without Labels [73.69513783788622]
We advocate a two-step approach where feature learning and clustering are decoupled.
A self-supervised task from representation learning is employed to obtain semantically meaningful features.
We obtain promising results on ImageNet, and outperform several semi-supervised learning methods in the low-data regime.
arXiv Detail & Related papers (2020-05-25T18:12:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.