Diffusion Autoencoders with Perceivers for Long, Irregular and Multimodal Astronomical Sequences
- URL: http://arxiv.org/abs/2510.20595v1
- Date: Thu, 23 Oct 2025 14:21:01 GMT
- Title: Diffusion Autoencoders with Perceivers for Long, Irregular and Multimodal Astronomical Sequences
- Authors: Yunyi Shen, Alexander Gagliano,
- Abstract summary: We introduce the Diffusion Autoencoder with Perceivers (daep)<n>daep tokenizes heterogeneous measurements, compresses them with a Perceiver encoder, and reconstructs them with a Perceiver-IO diffusion decoder.<n>Across diverse spectroscopic and photometric astronomical datasets, daep achieves lower reconstruction errors, produces more discriminative latent spaces, and better preserves fine-scale structure.
- Score: 47.1547360356314
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised learning has become a central strategy for representation learning, but the majority of architectures used for encoding data have only been validated on regularly-sampled inputs such as images, audios. and videos. In many scientific domains, data instead arrive as long, irregular, and multimodal sequences. To extract semantic information from these data, we introduce the Diffusion Autoencoder with Perceivers (daep). daep tokenizes heterogeneous measurements, compresses them with a Perceiver encoder, and reconstructs them with a Perceiver-IO diffusion decoder, enabling scalable learning in diverse data settings. To benchmark the daep architecture, we adapt the masked autoencoder to a Perceiver encoder/decoder design, and establish a strong baseline (maep) in the same architectural family as daep. Across diverse spectroscopic and photometric astronomical datasets, daep achieves lower reconstruction errors, produces more discriminative latent spaces, and better preserves fine-scale structure than both VAE and maep baselines. These results establish daep as an effective framework for scientific domains where data arrives as irregular, heterogeneous sequences.
Related papers
- Hybrid Autoencoders for Tabular Data: Leveraging Model-Based Augmentation in Low-Label Settings [13.591018807414484]
We propose a hybrid autoencoder that combines a neural encoder with an oblivious soft decision tree (OSDT) encoder, each guided by its own gating network.<n>Our method achieves consistent gains in low-label classification and regression across diverse datasets, outperforming deep and tree-based supervised baselines.
arXiv Detail & Related papers (2025-11-10T11:08:39Z) - Torsion in Persistent Homology and Neural Networks [0.0]
We show that torsion can be lost during encoding, altered in the latent space, and in many cases, not reconstructed by standard decoders.<n>Our findings reveal key limitations of field-based approaches and underline the need for architectures or loss terms that preserve torsional information.
arXiv Detail & Related papers (2025-06-03T16:29:06Z) - Geometry-Preserving Encoder/Decoder in Latent Generative Models [15.766401356353084]
We introduce a novel encoder/decoder framework with theoretical properties distinct from those of the VAE.<n>We demonstrate the significant advantages of this geometry-preserving encoder in the training process of both the encoder and decoder.
arXiv Detail & Related papers (2025-01-16T23:14:34Z) - Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding [90.77521413857448]
Deep generative models are anchored in three core capabilities -- generating new instances, reconstructing inputs, and learning compact representations.
We introduce Generalized generative adversarial-Decoding Diffusion Probabilistic Models (EDDPMs)
EDDPMs generalize the Gaussian noising-denoising in standard diffusion by introducing parameterized encoding-decoding.
Experiments on text, proteins, and images demonstrate the flexibility to handle diverse data and tasks.
arXiv Detail & Related papers (2024-02-29T10:08:57Z) - Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space.
We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z) - Dataset Condensation with Latent Space Knowledge Factorization and
Sharing [73.31614936678571]
We introduce a novel approach for solving dataset condensation problem by exploiting the regularity in a given dataset.
Instead of condensing the dataset directly in the original input space, we assume a generative process of the dataset with a set of learnable codes.
We experimentally show that our method achieves new state-of-the-art records by significant margins on various benchmark datasets.
arXiv Detail & Related papers (2022-08-21T18:14:08Z) - COIN++: Data Agnostic Neural Compression [55.27113889737545]
COIN++ is a neural compression framework that seamlessly handles a wide range of data modalities.
We demonstrate the effectiveness of our method by compressing various data modalities.
arXiv Detail & Related papers (2022-01-30T20:12:04Z) - UNETR: Transformers for 3D Medical Image Segmentation [8.59571749685388]
We introduce a novel architecture, dubbed as UNEt TRansformers (UNETR), that utilizes a pure transformer as the encoder to learn sequence representations of the input volume.
We have extensively validated the performance of our proposed model across different imaging modalities.
arXiv Detail & Related papers (2021-03-18T20:17:15Z) - Encoded Prior Sliced Wasserstein AutoEncoder for learning latent
manifold representations [0.7614628596146599]
We introduce an Encoded Prior Sliced Wasserstein AutoEncoder.
An additional prior-encoder network learns an embedding of the data manifold.
We show that the prior encodes the geometry underlying the data unlike conventional autoencoders.
arXiv Detail & Related papers (2020-10-02T14:58:54Z) - Rethinking and Improving Natural Language Generation with Layer-Wise
Multi-View Decoding [59.48857453699463]
In sequence-to-sequence learning, the decoder relies on the attention mechanism to efficiently extract information from the encoder.
Recent work has proposed to use representations from different encoder layers for diversified levels of information.
We propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences.
arXiv Detail & Related papers (2020-05-16T20:00:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.