Autoencoder-based Semi-Supervised Dimensionality Reduction and Clustering for Scientific Ensembles
- URL: http://arxiv.org/abs/2512.11145v1
- Date: Thu, 11 Dec 2025 22:09:31 GMT
- Title: Autoencoder-based Semi-Supervised Dimensionality Reduction and Clustering for Scientific Ensembles
- Authors: Lennard Manuel, Hamid Gadirov, Steffen Frey,
- Abstract summary: This paper presents an enhanced autoencoder framework that incorporates a clustering loss, based on the soft silhouette score, alongside a contrastive loss to improve the visualization and interpretability of ensemble datasets.
- Score: 2.185867802485678
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Analyzing and visualizing scientific ensemble datasets with high dimensionality and complexity poses significant challenges. Dimensionality reduction techniques and autoencoders are powerful tools for extracting features, but they often struggle with such high-dimensional data. This paper presents an enhanced autoencoder framework that incorporates a clustering loss, based on the soft silhouette score, alongside a contrastive loss to improve the visualization and interpretability of ensemble datasets. First, EfficientNetV2 is used to generate pseudo-labels for the unlabeled portions of the scientific ensemble datasets. By jointly optimizing the reconstruction, clustering, and contrastive objectives, our method encourages similar data points to group together while separating distinct clusters in the latent space. UMAP is subsequently applied to this latent representation to produce 2D projections, which are evaluated using the silhouette score. Multiple types of autoencoders are evaluated and compared based on their ability to extract meaningful features. Experiments on two scientific ensemble datasets - channel structures in soil derived from Markov chain Monte Carlo, and droplet-on-film impact dynamics - show that models incorporating clustering or contrastive loss marginally outperform the baseline approaches.
Related papers
- Unsupervised Deep Clustering of MNIST with Triplet-Enhanced Convolutional Autoencoders [0.0]
This research implements an advanced unsupervised clustering system for MNIST handwritten digits.<n>A deep neural autoencoder requires a training process during phase one to develop minimal yet interpretive representations of images.
arXiv Detail & Related papers (2025-06-11T18:26:13Z) - Leveraging Multi-Modal Information to Enhance Dataset Distillation [9.251951276795255]
We introduce two key enhancements to dataset distillation: caption-guided supervision and object-centric masking.<n>To integrate textual information, we propose two strategies for leveraging caption features: the feature concatenation and caption matching.<n> Comprehensive evaluations demonstrate that integrating caption-based guidance and object-centric masking enhances dataset distillation.
arXiv Detail & Related papers (2025-05-13T14:20:11Z) - Autoencoded UMAP-Enhanced Clustering for Unsupervised Learning [49.1574468325115]
We propose a novel approach to unsupervised learning by constructing a non-linear embedding of the data into a low-dimensional space followed by any conventional clustering algorithm.<n>The embedding promotes clusterability of the data and is comprised of two mappings: the encoder of an autoencoder neural network and the output of UMAP algorithm.<n>When applied to MNIST data, AUEC significantly outperforms the state-of-the-art techniques in terms of clustering accuracy.
arXiv Detail & Related papers (2025-01-13T22:30:38Z) - Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.<n>In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.<n>This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - Deep Clustering Using the Soft Silhouette Score: Towards Compact and
Well-Separated Clusters [0.0]
We propose soft silhoutte, a probabilistic formulation of the silhouette coefficient.
We introduce an autoencoder-based deep learning architecture that is suitable for optimizing the soft silhouette objective function.
The proposed deep clustering method has been tested and compared with several well-studied deep clustering methods on various benchmark datasets.
arXiv Detail & Related papers (2024-02-01T14:02:06Z) - Sampling-enabled scalable manifold learning unveils the discriminative cluster structure of high-dimensional data [17.411028691739897]
We propose a sampling-based Scalable manifold learning technique that enables Uniform and Discriminative Embedding, namely SUDE, for large-scale and high-dimensional data.<n>We empirically validated the effectiveness of SUDE on synthetic datasets and real-world benchmarks, and applied it to analyze single-cell data and detect anomalies in electrocardiogram (ECG) signals.
arXiv Detail & Related papers (2024-01-02T08:43:06Z) - Effective and Efficient Graph Learning for Multi-view Clustering [173.8313827799077]
We propose an effective and efficient graph learning model for multi-view clustering.
Our method exploits the view-similar between graphs of different views by the minimization of tensor Schatten p-norm.
Our proposed algorithm is time-economical and obtains the stable results and scales well with the data size.
arXiv Detail & Related papers (2021-08-15T13:14:28Z) - Generative Partial Visual-Tactile Fused Object Clustering [81.17645983141773]
We propose a Generative Partial Visual-Tactile Fused (i.e., GPVTF) framework for object clustering.
A conditional cross-modal clustering generative adversarial network is then developed to synthesize one modality conditioning on the other modality.
To the end, two pseudo-label based KL-divergence losses are employed to update the corresponding modality-specific encoders.
arXiv Detail & Related papers (2020-12-28T02:37:03Z) - Mixing Consistent Deep Clustering [3.5786621294068373]
Good latent representations produce semantically mixed outputs when decoding linears of two latent representations.
We propose the Mixing Consistent Deep Clustering method which encourages representations to appear realistic.
We show that the proposed method can be added to existing autoencoders to further improve clustering performance.
arXiv Detail & Related papers (2020-11-03T19:47:06Z) - Dual Adversarial Auto-Encoders for Clustering [152.84443014554745]
We propose Dual Adversarial Auto-encoder (Dual-AAE) for unsupervised clustering.
By performing variational inference on the objective function of Dual-AAE, we derive a new reconstruction loss which can be optimized by training a pair of Auto-encoders.
Experiments on four benchmarks show that Dual-AAE achieves superior performance over state-of-the-art clustering methods.
arXiv Detail & Related papers (2020-08-23T13:16:34Z) - Two-Dimensional Semi-Nonnegative Matrix Factorization for Clustering [50.43424130281065]
We propose a new Semi-Nonnegative Matrix Factorization method for 2-dimensional (2D) data, named TS-NMF.
It overcomes the drawback of existing methods that seriously damage the spatial information of the data by converting 2D data to vectors in a preprocessing step.
arXiv Detail & Related papers (2020-05-19T05:54:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.