A Deep Learning Object Detection Method for an Efficient Clusters
Initialization
- URL: http://arxiv.org/abs/2104.13634v2
- Date: Thu, 29 Apr 2021 07:03:27 GMT
- Title: A Deep Learning Object Detection Method for an Efficient Clusters
Initialization
- Authors: Hassan N. Noura, Ola Salman, Rapha\"el Couturier, Abderrahmane Sider
- Abstract summary: Clustering has been used in numerous applications such as banking customers profiling, document retrieval, image segmentation, and e-commerce recommendation engines.
Existing clustering techniques present significant limitations, from which is the dependability of their stability on the initialization parameters.
This paper proposes a solution that can provide near-optimal clustering parameters with low computational and resources overhead.
- Score: 6.365889364810239
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clustering is an unsupervised machine learning method grouping data samples
into clusters of similar objects. In practice, clustering has been used in
numerous applications such as banking customers profiling, document retrieval,
image segmentation, and e-commerce recommendation engines. However, the
existing clustering techniques present significant limitations, from which is
the dependability of their stability on the initialization parameters (e.g.
number of clusters, centroids). Different solutions were presented in the
literature to overcome this limitation (i.e. internal and external validation
metrics). However, these solutions require high computational complexity and
memory consumption, especially when dealing with high dimensional data. In this
paper, we apply the recent object detection Deep Learning (DL) model, named
YOLO-v5, to detect the initial clustering parameters such as the number of
clusters with their sizes and possible centroids. Mainly, the proposed solution
consists of adding a DL-based initialization phase making the clustering
algorithms free of initialization. The results show that the proposed solution
can provide near-optimal clusters initialization parameters with low
computational and resources overhead compared to existing solutions.
Related papers
- Deep Embedding Clustering Driven by Sample Stability [16.53706617383543]
We propose a deep embedding clustering algorithm driven by sample stability (DECS)
Specifically, we start by constructing the initial feature space with an autoencoder and then learn the cluster-oriented embedding feature constrained by sample stability.
The experimental results on five datasets illustrate that the proposed method achieves superior performance compared to state-of-the-art clustering approaches.
arXiv Detail & Related papers (2024-01-29T09:19:49Z) - Robust and Automatic Data Clustering: Dirichlet Process meets
Median-of-Means [18.3248037914529]
We present an efficient and automatic clustering technique by integrating the principles of model-based and centroid-based methodologies.
Statistical guarantees on the upper bound of clustering error suggest the advantages of our proposed method over existing state-of-the-art clustering algorithms.
arXiv Detail & Related papers (2023-11-26T19:01:15Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Hard Regularization to Prevent Deep Online Clustering Collapse without
Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed.
While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster.
We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z) - An Instance Selection Algorithm for Big Data in High imbalanced datasets
based on LSH [0.0]
Training Machine Learning models in real contexts often deals with big data sets and imbalance samples where the class of interest is unrepresented.
This work proposes three new methods for instance selection (IS) to be able to deal with large and imbalanced data sets.
Algorithms were developed in the Apache Spark framework, guaranteeing their scalability.
arXiv Detail & Related papers (2022-10-09T17:38:41Z) - A One-shot Framework for Distributed Clustered Learning in Heterogeneous
Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments.
One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees.
For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z) - Meta Clustering Learning for Large-scale Unsupervised Person
Re-identification [124.54749810371986]
We propose a "small data for big task" paradigm dubbed Meta Clustering Learning (MCL)
MCL only pseudo-labels a subset of the entire unlabeled data via clustering to save computing for the first-phase training.
Our method significantly saves computational cost while achieving a comparable or even better performance compared to prior works.
arXiv Detail & Related papers (2021-11-19T04:10:18Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Learnable Subspace Clustering [76.2352740039615]
We develop a learnable subspace clustering paradigm to efficiently solve the large-scale subspace clustering problem.
The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces.
To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods.
arXiv Detail & Related papers (2020-04-09T12:53:28Z) - Probabilistic Partitive Partitioning (PPP) [0.0]
Clustering algorithms, in general, face two common problems.
They converge to different settings with different initial conditions.
The number of clusters has to be arbitrarily decided beforehand.
arXiv Detail & Related papers (2020-03-09T19:18:35Z) - A Hybrid Algorithm Based Robust Big Data Clustering for Solving
Unhealthy Initialization, Dynamic Centroid Selection and Empty clustering
Problems with Analysis [0.0]
Clustering algorithms have developed as a powerful learning tool that can analyze the volume of data that produced by modern applications.
Our proposed algorithm EG K-MEANS : Extended Generation K-MEANS solves mainly three issues of K-MEANS.
arXiv Detail & Related papers (2020-02-21T16:09:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.