CosmoFlow: Scale-Aware Representation Learning for Cosmology with Flow Matching
- URL: http://arxiv.org/abs/2507.11842v1
- Date: Wed, 16 Jul 2025 02:15:31 GMT
- Title: CosmoFlow: Scale-Aware Representation Learning for Cosmology with Flow Matching
- Authors: Sidharth Kannan, Tian Qiu, Carolina Cuesta-Lazaro, Haewon Jeong,
- Abstract summary: Flow matching based generative models can learn compact, semantically rich latent representations of field level cold dark matter (CDM) simulation data without supervision.<n>Our model, CosmoFlow, learns representations 32x smaller than the raw field data, usable for field level reconstruction, synthetic data generation, and parameter inference.
- Score: 5.952011729053457
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative machine learning models have been demonstrated to be able to learn low dimensional representations of data that preserve information required for downstream tasks. In this work, we demonstrate that flow matching based generative models can learn compact, semantically rich latent representations of field level cold dark matter (CDM) simulation data without supervision. Our model, CosmoFlow, learns representations 32x smaller than the raw field data, usable for field level reconstruction, synthetic data generation, and parameter inference. Our model also learns interpretable representations, in which different latent channels correspond to features at different cosmological scales.
Related papers
- Sequential Modeling Enables Scalable Learning for Large Vision Models [120.91839619284431]
We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data.
We define a common format, "visual sentences", in which we can represent raw images and videos as well as annotated data sources.
arXiv Detail & Related papers (2023-12-01T18:59:57Z) - SODA: Bottleneck Diffusion Models for Representation Learning [75.7331354734152]
We introduce SODA, a self-supervised diffusion model, designed for representation learning.
The model incorporates an image encoder, which distills a source view into a compact representation, that guides the generation of related novel views.
We show that by imposing a tight bottleneck between the encoder and a denoising decoder, we can turn diffusion models into strong representation learners.
arXiv Detail & Related papers (2023-11-29T18:53:34Z) - Domain Adaptive Graph Neural Networks for Constraining Cosmological Parameters Across Multiple Data Sets [40.19690479537335]
We show that DA-GNN achieves higher accuracy and robustness on cross-dataset tasks.
This shows that DA-GNNs are a promising method for extracting domain-independent cosmological information.
arXiv Detail & Related papers (2023-11-02T20:40:21Z) - Homological Convolutional Neural Networks [4.615338063719135]
We propose a novel deep learning architecture that exploits the data structural organization through topologically constrained network representations.
We test our model on 18 benchmark datasets against 5 classic machine learning and 3 deep learning models.
arXiv Detail & Related papers (2023-08-26T08:48:51Z) - Data Compression and Inference in Cosmology with Self-Supervised Machine
Learning [0.86325068644655]
We introduce a method that leverages the paradigm of self-supervised machine learning in a novel manner to construct representative summaries of massive datasets.
Deploying the method on hydrodynamical cosmological simulations, we show that it can deliver highly informative summaries.
Results indicate that self-supervised machine learning techniques offer a promising new approach for compression of cosmological data as well its analysis.
arXiv Detail & Related papers (2023-08-18T18:00:22Z) - Multiscale Flow for Robust and Optimal Cosmological Analysis [7.977229957867868]
Multiscale Flow is a generative Normalizing Flow that creates samples and models the field-level likelihood of two-dimensional cosmological data.
We show that Multiscale Flow is able to identify distribution shifts not in the training data such as baryonic effects.
arXiv Detail & Related papers (2023-06-07T18:00:06Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Bridging the Gap to Real-World Object-Centric Learning [66.55867830853803]
We show that reconstructing features from models trained in a self-supervised manner is a sufficient training signal for object-centric representations to arise in a fully unsupervised way.
Our approach, DINOSAUR, significantly out-performs existing object-centric learning models on simulated data.
arXiv Detail & Related papers (2022-09-29T15:24:47Z) - Low-Rank Constraints for Fast Inference in Structured Models [110.38427965904266]
This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models.
Experiments with neural parameterized structured models for language modeling, polyphonic music modeling, unsupervised grammar induction, and video modeling show that our approach matches the accuracy of standard models at large state spaces.
arXiv Detail & Related papers (2022-01-08T00:47:50Z) - Contrastive Neighborhood Alignment [81.65103777329874]
We present Contrastive Neighborhood Alignment (CNA), a manifold learning approach to maintain the topology of learned features.
The target model aims to mimic the local structure of the source representation space using a contrastive loss.
CNA is illustrated in three scenarios: manifold learning, where the model maintains the local topology of the original data in a dimension-reduced space; model distillation, where a small student model is trained to mimic a larger teacher; and legacy model update, where an older model is replaced by a more powerful one.
arXiv Detail & Related papers (2022-01-06T04:58:31Z) - Characterizing the Latent Space of Molecular Deep Generative Models with
Persistent Homology Metrics [21.95240820041655]
Variational Autos (VAEs) are generative models in which encoder-decoder network pairs are trained to reconstruct training data distributions.
We propose a method for measuring how well the latent space of deep generative models is able to encode structural and chemical features.
arXiv Detail & Related papers (2020-10-18T13:33:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.