Fast-VAT: Accelerating Cluster Tendency Visualization using Cython and Numba
- URL: http://arxiv.org/abs/2507.15904v1
- Date: Mon, 21 Jul 2025 11:00:55 GMT
- Title: Fast-VAT: Accelerating Cluster Tendency Visualization using Cython and Numba
- Authors: MSR Avinash, Ismael Lachheb,
- Abstract summary: We present Fast-VAT, a high-performance reimplementation of the Visual Assessment of Cluster Tendency (VAT) algorithm in Python.<n>Our approach achieves up to 50x speedup over the baseline implementation, while preserving the output fidelity of the original method.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual Assessment of Cluster Tendency (VAT) is a widely used unsupervised technique to assess the presence of cluster structure in unlabeled datasets. However, its standard implementation suffers from significant performance limitations due to its O(n^2) time complexity and inefficient memory usage. In this work, we present Fast-VAT, a high-performance reimplementation of the VAT algorithm in Python, augmented with Numba's Just-In-Time (JIT) compilation and Cython's static typing and low-level memory optimizations. Our approach achieves up to 50x speedup over the baseline implementation, while preserving the output fidelity of the original method. We validate Fast-VAT on a suite of real and synthetic datasets -- including Iris, Mall Customers, and Spotify subsets -- and verify cluster tendency using Hopkins statistics, PCA, and t-SNE. Additionally, we compare VAT's structural insights with clustering results from DBSCAN and K-Means to confirm its reliability.
Related papers
- Neutralizing Token Aggregation via Information Augmentation for Efficient Test-Time Adaptation [59.1067331268383]
Test-Time Adaptation (TTA) has emerged as an effective solution for adapting Vision Transformers (ViT) to distribution shifts without additional training data.<n>To reduce inference cost, plug-and-play token aggregation methods merge redundant tokens in ViTs to reduce total processed tokens.<n>We formalize this problem as Efficient Test-Time Adaptation (ETTA), seeking to preserve the adaptation capability of TTA while reducing inference latency.
arXiv Detail & Related papers (2025-08-05T12:40:55Z) - A system identification approach to clustering vector autoregressive time series [50.66782357329375]
Clustering time series based on their underlying dynamics is keeping attracting researchers due to its impacts on assisting complex system modelling.<n>Most current time series clustering methods handle only scalar time series, treat them as white noise, or rely on domain knowledge for high-quality feature construction.<n>Instead of relying on feature/metric construction, the system identification approach allows treating vector time series clustering by explicitly considering their underlying autoregressive dynamics.
arXiv Detail & Related papers (2025-05-20T14:31:44Z) - Locally Adaptive One-Class Classifier Fusion with Dynamic $\ell$p-Norm Constraints for Robust Anomaly Detection [17.93058599783703]
We introduce a framework that dynamically adjusts fusion weights based on local data characteristics.
Our method incorporates an interior-point optimization technique that significantly improves computational efficiency.
The framework's ability to adapt to local data patterns while maintaining computational efficiency makes it particularly valuable for real-time applications.
arXiv Detail & Related papers (2024-11-10T09:57:13Z) - CAST: Clustering Self-Attention using Surrogate Tokens for Efficient
Transformers [3.129187821625805]
We propose a novel Clustering self-Attention mechanism using Surrogate Tokens (CAST), to optimize the attention and achieve efficient transformers.
CAST improves efficiency by reducing the complexity from $O(N2)$ to $O(alpha N)$ where N is the sequence length, and alpha is constant according to the number of clusters and samples per cluster.
arXiv Detail & Related papers (2024-02-06T18:47:52Z) - Realistic Unsupervised CLIP Fine-tuning with Universal Entropy Optimization [101.08992036691673]
This paper explores a realistic unsupervised fine-tuning scenario, considering the presence of out-of-distribution samples from unknown classes.
In particular, we focus on simultaneously enhancing out-of-distribution detection and the recognition of instances associated with known classes.
We present a simple, efficient, and effective approach called Universal Entropy Optimization (UEO)
arXiv Detail & Related papers (2023-08-24T16:47:17Z) - Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data.
The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships.
A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z) - Tensor Slicing and Optimization for Multicore NPUs [2.670309629218727]
This paper proposes a compiler optimization pass for Multicore NPUs, called Slicing Optimization (TSO)
TSO identifies the best tensor slicing that minimizes execution time for a set of CNN models.
Results show that TSO is capable of identifying the best tensor slicing that minimizes execution time for a set of CNN models.
arXiv Detail & Related papers (2023-04-06T12:03:03Z) - Dynamic Conceptional Contrastive Learning for Generalized Category
Discovery [76.82327473338734]
Generalized category discovery (GCD) aims to automatically cluster partially labeled data.
Unlabeled data contain instances that are not only from known categories of the labeled data but also from novel categories.
One effective way for GCD is applying self-supervised learning to learn discriminate representation for unlabeled data.
We propose a Dynamic Conceptional Contrastive Learning framework, which can effectively improve clustering accuracy.
arXiv Detail & Related papers (2023-03-30T14:04:39Z) - Structured Inverted-File k-Means Clustering for High-Dimensional Sparse
Data [2.487445341407889]
This paper presents an architecture-friendly k-means clustering algorithm called SIVF for a large-scale and high-dimensional sparse data set.
Our performance analysis reveals that SIVF achieves the higher speed by suppressing performance degradation factors of the number of cache misses and branch mispredictions.
arXiv Detail & Related papers (2021-03-30T07:54:02Z) - ConiVAT: Cluster Tendency Assessment and Clustering with Partial
Background Knowledge [11.600065064765325]
ConiVAT is a constraint-based version of iVAT that makes use of background knowledge in the form of constraints.
We demonstrate ConiVAT approach to visual assessment and single linkage clustering on nine datasets.
arXiv Detail & Related papers (2020-08-21T16:30:31Z) - New advances in enumerative biclustering algorithms with online
partitioning [80.22629846165306]
This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets.
The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime.
arXiv Detail & Related papers (2020-03-07T14:54:26Z) - On Coresets for Support Vector Machines [61.928187390362176]
A coreset is a small, representative subset of the original data points.
We show that our algorithm can be used to extend the applicability of any off-the-shelf SVM solver to streaming, distributed, and dynamic data settings.
arXiv Detail & Related papers (2020-02-15T23:25:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.