Related papers: Spatio-Temporal Surrogates for Interaction of a Jet with High Explosives: Part II -- Clustering Extremely High-Dimensional Grid-Based Data

Spatio-Temporal Surrogates for Interaction of a Jet with High Explosives: Part II -- Clustering Extremely High-Dimensional Grid-Based Data

URL: http://arxiv.org/abs/2307.01400v1
Date: Mon, 3 Jul 2023 23:36:43 GMT
Title: Spatio-Temporal Surrogates for Interaction of a Jet with High Explosives: Part II -- Clustering Extremely High-Dimensional Grid-Based Data
Authors: Chandrika Kamath and Juliette S. Franzman
Abstract summary: In this report, we consider output data from simulations of a jet interacting with high explosives. We show how we can use the randomness of both the random projections, and the choice of initial centroids in k-means clustering, to determine the number of clusters in our data set.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Building an accurate surrogate model for the spatio-temporal outputs of a computer simulation is a challenging task. A simple approach to improve the accuracy of the surrogate is to cluster the outputs based on similarity and build a separate surrogate model for each cluster. This clustering is relatively straightforward when the output at each time step is of moderate size. However, when the spatial domain is represented by a large number of grid points, numbering in the millions, the clustering of the data becomes more challenging. In this report, we consider output data from simulations of a jet interacting with high explosives. These data are available on spatial domains of different sizes, at grid points that vary in their spatial coordinates, and in a format that distributes the output across multiple files at each time step of the simulation. We first describe how we bring these data into a consistent format prior to clustering. Borrowing the idea of random projections from data mining, we reduce the dimension of our data by a factor of thousand, making it possible to use the iterative k-means method for clustering. We show how we can use the randomness of both the random projections, and the choice of initial centroids in k-means clustering, to determine the number of clusters in our data set. Our approach makes clustering of extremely high dimensional data tractable, generating meaningful cluster assignments for our problem, despite the approximation introduced in the random projections.

Related papers

Hyperoctant Search Clustering: A Method for Clustering Data in High-Dimensional Hyperspheres [0.0]
We propose a new clustering method based on a topological approach applied to regions of space defined by signs of coordinates (hyperoctants) According to a density criterion, the method builds clusters of data points based on the partitioning of a graph. We choose the application of topic detection, which is an important task in text mining.
arXiv Detail & Related papers (2025-03-10T23:41:44Z)
Hard Regularization to Prevent Deep Online Clustering Collapse without Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed. While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster. We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z)
Research on Efficient Fuzzy Clustering Method Based on Local Fuzzy Granular balls [67.33923111887933]
In this paper, the data is fuzzy iterated using granular-balls, and the membership degree of data only considers the two granular-balls where it is located. The formed fuzzy granular-balls set can use more processing methods in the face of different data scenarios.
arXiv Detail & Related papers (2023-03-07T01:52:55Z)
Anomaly Clustering: Grouping Images into Coherent Clusters of Anomaly Types [60.45942774425782]
We introduce anomaly clustering, whose goal is to group data into coherent clusters of anomaly types. This is different from anomaly detection, whose goal is to divide anomalies from normal data. We present a simple yet effective clustering framework using a patch-based pretrained deep embeddings and off-the-shelf clustering methods.
arXiv Detail & Related papers (2021-12-21T23:11:33Z)
Clustering Plotted Data by Image Segmentation [12.443102864446223]
Clustering algorithms are one of the main analytical methods to detect patterns in unlabeled data. In this paper, we present a wholly different way of clustering points in 2-dimensional space, inspired by how humans cluster data. Our approach, Visual Clustering, has several advantages over traditional clustering algorithms.
arXiv Detail & Related papers (2021-10-06T06:19:30Z)
Local versions of sum-of-norms clustering [77.34726150561087]
We show that our method can separate arbitrarily close balls in the ball model. We prove a quantitative bound on the error incurred in the clustering of disjoint connected sets.
arXiv Detail & Related papers (2021-09-20T14:45:29Z)
Efficient Large-Scale Face Clustering Using an Online Mixture of Gaussians [1.3101369903953806]
We present an online gaussian mixture-based clustering method (OGMC) for large-scale online face clustering. Using feature vectors (f-vectors) extracted from the incoming faces, OGMC generates clusters that may be connected to others depending on their proximity and robustness. Experimental results show that the proposed approach outperforms state-of-the-art clustering methods on large-scale face clustering benchmarks.
arXiv Detail & Related papers (2021-03-31T17:59:38Z)
Determinantal consensus clustering [77.34726150561087]
We propose the use of determinantal point processes or DPP for the random restart of clustering algorithms. DPPs favor diversity of the center points within subsets. We show through simulations that, contrary to DPP, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets.
arXiv Detail & Related papers (2021-02-07T23:48:24Z)
Mixed data Deep Gaussian Mixture Model: A clustering model for mixed datasets [0.0]
We introduce a model-based clustering method called Mixed Deep Gaussian Mixture Model (MDGMM) This architecture is flexible and can be adapted to mixed as well as to continuous or non-continuous data. Our model provides continuous low-dimensional representations of the data which can be a useful tool to visualize mixed datasets.
arXiv Detail & Related papers (2020-10-13T19:52:46Z)
Clustering small datasets in high-dimension by random projection [2.2940141855172027]
We propose a low-computation method to find statistically significant clustering structures in a small dataset. The method proceeds by projecting the data on a random line and seeking binary clusterings in the resulting one-dimensional data. The statistical validity of the clustering structures obtained is tested in the projected one-dimensional space.
arXiv Detail & Related papers (2020-08-21T16:49:37Z)
Probabilistic Partitive Partitioning (PPP) [0.0]
Clustering algorithms, in general, face two common problems. They converge to different settings with different initial conditions. The number of clusters has to be arbitrarily decided beforehand.
arXiv Detail & Related papers (2020-03-09T19:18:35Z)
Conjoined Dirichlet Process [63.89763375457853]
We develop a novel, non-parametric probabilistic biclustering method based on Dirichlet processes to identify biclusters with strong co-occurrence in both rows and columns. We apply our method to two different applications, text mining and gene expression analysis, and demonstrate that our method improves bicluster extraction in many settings compared to existing approaches.
arXiv Detail & Related papers (2020-02-08T19:41:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.