Geometrical Homogeneous Clustering for Image Data Reduction
- URL: http://arxiv.org/abs/2208.13079v1
- Date: Sat, 27 Aug 2022 19:42:46 GMT
- Title: Geometrical Homogeneous Clustering for Image Data Reduction
- Authors: Shril Mody, Janvi Thakkar, Devvrat Joshi, Siddharth Soni, Rohan Patil,
Nipun Batra
- Abstract summary: We present novel variations of an earlier approach called homogeneous clustering algorithm for reducing dataset size.
We experimented with the four variants on three datasets- MNIST, CIFAR10, and Fashion-MNIST.
We found that GHCIDR gave the best accuracy of 99.35%, 81.10%, and 91.66% and a training data reduction of 87.27%, 32.34%, and 76.80% respectively.
- Score: 2.290085549352983
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present novel variations of an earlier approach called
homogeneous clustering algorithm for reducing dataset size. The intuition
behind the approaches proposed in this paper is to partition the dataset into
homogeneous clusters and select some images which contribute significantly to
the accuracy. Selected images are the proper subset of the training data and
thus are human-readable. We propose four variations upon the baseline
algorithm-RHC. The intuition behind the first approach, RHCKON, is that the
boundary points contribute significantly towards the representation of
clusters. It involves selecting k farthest and one nearest neighbour of the
centroid of the clusters. In the following two approaches (KONCW and CWKC), we
introduce the concept of cluster weights. They are based on the fact that
larger clusters contribute more than smaller sized clusters. The final
variation is GHCIDR which selects points based on the geometrical aspect of
data distribution. We performed the experiments on two deep learning models-
Fully Connected Networks (FCN) and VGG1. We experimented with the four variants
on three datasets- MNIST, CIFAR10, and Fashion-MNIST. We found that GHCIDR gave
the best accuracy of 99.35%, 81.10%, and 91.66% and a training data reduction
of 87.27%, 32.34%, and 76.80% on MNIST, CIFAR10, and Fashion-MNIST
respectively.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Dink-Net: Neural Clustering on Large Graphs [59.10189693120368]
A deep graph clustering method (Dink-Net) is proposed with the idea of dilation and shrink.
By discriminating nodes, whether being corrupted by augmentations, representations are learned in a self-supervised manner.
The clustering distribution is optimized by minimizing the proposed cluster dilation loss and cluster shrink loss.
Compared to the runner-up, Dink-Net 9.62% achieves NMI improvement on the ogbn-papers100M dataset with 111 million nodes and 1.6 billion edges.
arXiv Detail & Related papers (2023-05-28T15:33:24Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Hard Regularization to Prevent Deep Online Clustering Collapse without
Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed.
While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster.
We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z) - Distributed Solution of the Inverse Rig Problem in Blendshape Facial
Animation [0.0]
Rig inversion is central in facial animation as it allows for a realistic and appealing performance of avatars.
A possible approach towards a faster solution is clustering, which exploits the spacial nature of the face.
In this paper, we go a step further, involving cluster coupling to get more confident estimates of the overlapping components.
arXiv Detail & Related papers (2023-03-11T10:34:07Z) - An enhanced method of initial cluster center selection for K-means
algorithm [0.0]
We propose a novel approach to improve initial cluster selection for K-means algorithm.
The Convex Hull algorithm facilitates the computing of the first two centroids and the remaining ones are selected according to the distance from previously selected centers.
We obtained only 7.33%, 7.90%, and 0% clustering error in Iris, Letter, and Ruspini data respectively.
arXiv Detail & Related papers (2022-10-18T00:58:50Z) - Merged-GHCIDR: Geometrical Approach to Reduce Image Data [2.290085549352983]
Training neural networks on massive datasets have become a challenging and time-consuming task.
We present novel variations of an earlier approach called reduction through homogeneous clustering for reducing dataset size.
We propose two variations: Geometrical Homogeneous Clustering for Image Data Reduction (GHCIDR) and Merged-GHCIDR upon the baseline algorithm.
arXiv Detail & Related papers (2022-09-06T16:03:15Z) - Learning Hierarchical Graph Neural Networks for Image Clustering [81.5841862489509]
We propose a hierarchical graph neural network (GNN) model that learns how to cluster a set of images into an unknown number of identities.
Our hierarchical GNN uses a novel approach to merge connected components predicted at each level of the hierarchy to form a new graph at the next level.
arXiv Detail & Related papers (2021-07-03T01:28:42Z) - Spectral Clustering with Smooth Tiny Clusters [14.483043753721256]
We propose a novel clustering algorithm, which con-siders the smoothness of data for the first time.
Our key idea is to cluster tiny clusters, whose centers constitute smooth graphs.
Although in this paper, we singly focus on multi-scale situations, the idea of data smoothness can certainly be extended to any clustering algorithms.
arXiv Detail & Related papers (2020-09-10T05:21:20Z) - Improving k-Means Clustering Performance with Disentangled Internal
Representations [0.0]
We propose a simpler approach of optimizing the entanglement of the learned latent code representation of an autoencoder.
Using our proposed approach, the test clustering accuracy was 96.2% on the MNIST dataset, 85.6% on the Fashion-MNIST dataset, and 79.2% on the EMNIST Balanced dataset, outperforming our baseline models.
arXiv Detail & Related papers (2020-06-05T11:32:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.