Comparison of Clustering Algorithms for Statistical Features of
  Vibration Data Sets
        - URL: http://arxiv.org/abs/2305.06753v1
- Date: Thu, 11 May 2023 12:19:30 GMT
- Title: Comparison of Clustering Algorithms for Statistical Features of
  Vibration Data Sets
- Authors: Philipp Sepin, Jana Kemnitz, Safoura Rezapour Lakani and Daniel Schall
- Abstract summary: We present an extensive comparison of the clustering algorithms K-means clustering, OPTICS, and Gaussian mixture model clustering (GMM) applied to statistical features extracted from the time and frequency domains of vibration data sets.
Our work showed that averaging (Mean, Median) and variance-based features (Standard Deviation, Interquartile Range) performed significantly better than shape-based features (Skewness, Kurtosis)
With an increase in the specified number of clusters, clustering algorithms performed better, although there were some specific algorithmic restrictions.
- Score: 0.4806505912512235
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Vibration-based condition monitoring systems are receiving increasing
attention due to their ability to accurately identify different conditions by
capturing dynamic features over a broad frequency range. However, there is
little research on clustering approaches in vibration data and the resulting
solutions are often optimized for a single data set. In this work, we present
an extensive comparison of the clustering algorithms K-means clustering,
OPTICS, and Gaussian mixture model clustering (GMM) applied to statistical
features extracted from the time and frequency domains of vibration data sets.
Furthermore, we investigate the influence of feature combinations, feature
selection using principal component analysis (PCA), and the specified number of
clusters on the performance of the clustering algorithms. We conducted this
comparison in terms of a grid search using three different benchmark data sets.
Our work showed that averaging (Mean, Median) and variance-based features
(Standard Deviation, Interquartile Range) performed significantly better than
shape-based features (Skewness, Kurtosis). In addition, K-means outperformed
GMM slightly for these data sets, whereas OPTICS performed significantly worse.
We were also able to show that feature combinations as well as PCA feature
selection did not result in any significant performance improvements. With an
increase in the specified number of clusters, clustering algorithms performed
better, although there were some specific algorithmic restrictions.
 
      
        Related papers
        - K*-Means: A Parameter-free Clustering Algorithm [55.20132267309382]
 k*-means is a novel clustering algorithm that eliminates the need to set k or any other parameters.<n>It uses the minimum description length principle to automatically determine the optimal number of clusters, k*, by splitting and merging clusters.<n>We prove that k*-means is guaranteed to converge and demonstrate experimentally that it significantly outperforms existing methods in scenarios where k is unknown.
 arXiv  Detail & Related papers  (2025-05-17T08:41:07Z)
- Estimating the Optimal Number of Clusters in Categorical Data Clustering   by Silhouette Coefficient [0.5939858158928473]
 This paper proposes an algorithm named k- SCC to estimate the optimal k in categorical data clustering.
 Comparative experiments were conducted on both synthetic and real datasets to compare the performance of k- SCC.
 arXiv  Detail & Related papers  (2025-01-26T14:29:11Z)
- The ParClusterers Benchmark Suite (PCBS): A Fine-Grained Analysis of   Scalable Graph Clustering [15.047567897051376]
 ParClusterers Benchmark Suite (PCBS) is a collection of highly scalable parallel graph clustering algorithms and benchmarking tools.
PCBS provides a standardized way to evaluate and judge the quality-performance tradeoffs of the active research area of scalable graph clustering algorithms.
 arXiv  Detail & Related papers  (2024-11-15T15:47:32Z)
- A3S: A General Active Clustering Method with Pairwise Constraints [66.74627463101837]
 A3S features strategic active clustering adjustment on the initial cluster result, which is obtained by an adaptive clustering algorithm.
In extensive experiments across diverse real-world datasets, A3S achieves desired results with significantly fewer human queries.
 arXiv  Detail & Related papers  (2024-07-14T13:37:03Z)
- Superclustering by finding statistically significant separable groups of
  optimal gaussian clusters [0.0]
 The paper presents the algorithm for clustering a dataset by grouping the optimal, from the point of view of the BIC criterion.
An essential advantage of the algorithm is its ability to predict correct supercluster for new data based on already trained clusterer.
 arXiv  Detail & Related papers  (2023-09-05T23:49:46Z)
- Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
 We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
 arXiv  Detail & Related papers  (2023-06-18T08:46:06Z)
- Rethinking k-means from manifold learning perspective [122.38667613245151]
 We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
 arXiv  Detail & Related papers  (2023-05-12T03:01:41Z)
- Performance evaluation results of evolutionary clustering algorithm star
  for clustering heterogeneous datasets [15.154538450706474]
 This article presents the data used to evaluate the performance of evolutionary clustering algorithm star (ECA*)
Two experimental methods are employed to examine the performance of ECA* against five traditional and modern clustering algorithms.
 arXiv  Detail & Related papers  (2021-04-30T08:17:19Z)
- A Multi-disciplinary Ensemble Algorithm for Clustering Heterogeneous
  Datasets [0.76146285961466]
 We propose a new evolutionary clustering algorithm (ECAStar) based on social class ranking and meta-heuristic algorithms.
ECAStar is integrated with recombinational evolutionary operators, Levy flight optimisation, and some statistical techniques.
Experiments are conducted to evaluate the ECAStar against five conventional approaches.
 arXiv  Detail & Related papers  (2021-01-01T07:20:50Z)
- Contrastive Clustering [57.71729650297379]
 We propose Contrastive Clustering (CC) which explicitly performs the instance- and cluster-level contrastive learning.
In particular, CC achieves an NMI of 0.705 (0.431) on the CIFAR-10 (CIFAR-100) dataset, which is an up to 19% (39%) performance improvement compared with the best baseline.
 arXiv  Detail & Related papers  (2020-09-21T08:54:40Z)
- Decorrelated Clustering with Data Selection Bias [55.91842043124102]
 We propose a novel Decorrelation regularized K-Means algorithm (DCKM) for clustering with data selection bias.
Our DCKM algorithm achieves significant performance gains, indicating the necessity of removing unexpected feature correlations induced by selection bias.
 arXiv  Detail & Related papers  (2020-06-29T08:55:50Z)
- CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on
  Multi-scale Data [34.89460002735166]
 We study the problem of applying spectral clustering to cluster multi-scale data.
For multi-scale data, distance-based similarity is not effective because objects of a sparse cluster could be far apart.
We propose the algorithm CAST that applies trace Lasso to regularize the coefficient matrix.
 arXiv  Detail & Related papers  (2020-06-08T09:46:35Z)
- New advances in enumerative biclustering algorithms with online
  partitioning [80.22629846165306]
 This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets.
The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime.
 arXiv  Detail & Related papers  (2020-03-07T14:54:26Z)
- Clustering Binary Data by Application of Combinatorial Optimization
  Heuristics [52.77024349608834]
 We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
 arXiv  Detail & Related papers  (2020-01-06T23:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.