Related papers: CosmoBench: A Multiscale, Multiview, Multitask Cosmology Benchmark for Geometric Deep Learning

CosmoBench: A Multiscale, Multiview, Multitask Cosmology Benchmark for Geometric Deep Learning

URL: http://arxiv.org/abs/2507.03707v1
Date: Fri, 04 Jul 2025 16:46:25 GMT
Title: CosmoBench: A Multiscale, Multiview, Multitask Cosmology Benchmark for Geometric Deep Learning
Authors: Ningyuan Huang, Richard Stiskalek, Jun-Young Lee, Adrian E. Bayer, Charles C. Margossian, Christian Kragh Jespersen, Lucia A. Perez, Lawrence K. Saul, Francisco Villaescusa-Navarro,
Abstract summary: Cosmological simulations provide a wealth of data in the form of point clouds and directed trees.<n>A crucial goal is to extract insights from this data that shed light on the nature and composition of the Universe.<n>We introduce CosmoBench, a benchmark dataset curated from state-of-the-art cosmological simulations.
Score: 4.340305187316021
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cosmological simulations provide a wealth of data in the form of point clouds and directed trees. A crucial goal is to extract insights from this data that shed light on the nature and composition of the Universe. In this paper we introduce CosmoBench, a benchmark dataset curated from state-of-the-art cosmological simulations whose runs required more than 41 million core-hours and generated over two petabytes of data. CosmoBench is the largest dataset of its kind: it contains 34 thousand point clouds from simulations of dark matter halos and galaxies at three different length scales, as well as 25 thousand directed trees that record the formation history of halos on two different time scales. The data in CosmoBench can be used for multiple tasks -- to predict cosmological parameters from point clouds and merger trees, to predict the velocities of individual halos and galaxies from their collective positions, and to reconstruct merger trees on finer time scales from those on coarser time scales. We provide several baselines on these tasks, some based on established approaches from cosmological modeling and others rooted in machine learning. For the latter, we study different approaches -- from simple linear models that are minimally constrained by symmetries to much larger and more computationally-demanding models in deep learning, such as graph neural networks. We find that least-squares fits with a handful of invariant features sometimes outperform deep architectures with many more parameters and far longer training times. Still there remains tremendous potential to improve these baselines by combining machine learning and cosmology to fully exploit the data. CosmoBench sets the stage for bridging cosmology and geometric deep learning at scale. We invite the community to push the frontier of scientific discovery by engaging with this dataset, available at https://cosmobench.streamlit.app

Related papers

Historical Astronomical Diagrams Decomposition in Geometric Primitives [13.447991818689463]
We introduce a unique dataset of 303 astronomical diagrams from diverse traditions, ranging from the XIIth to the XVIIIth century. We develop a model that builds on DINO-DETR to enable the prediction of multiple geometric primitives. Our approach widely improves over the LETR baseline, which is restricted to lines, by introducing a meaningful parametrization for multiple primitives.
arXiv Detail & Related papers (2024-03-13T17:20:25Z)
Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction. Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations. On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z)
Cosmology from Galaxy Redshift Surveys with PointNet [65.89809800010927]
In cosmology, galaxy redshift surveys resemble such a permutation invariant collection of positions in space. We employ a textitPointNet-like neural network to regress the values of the cosmological parameters directly from point cloud data. Our implementation of PointNets can analyse inputs of $mathcalO(104) - mathcalO(105)$ galaxies at a time, which improves upon earlier work for this application by roughly two orders of magnitude.
arXiv Detail & Related papers (2022-11-22T15:35:05Z)
Semi-Supervised Domain Adaptation for Cross-Survey Galaxy Morphology Classification and Anomaly Detection [57.85347204640585]
We develop a Universal Domain Adaptation method DeepAstroUDA. It can be applied to datasets with different types of class overlap. For the first time, we demonstrate the successful use of domain adaptation on two very different observational datasets.
arXiv Detail & Related papers (2022-11-01T18:07:21Z)
Convolutional Neural Networks on Manifolds: From Graphs and Back [122.06927400759021]
We propose a manifold neural network (MNN) composed of a bank of manifold convolutional filters and point-wise nonlinearities. To sum up, we focus on the manifold model as the limit of large graphs and construct MNNs, while we can still bring back graph neural networks by the discretization of MNNs.
arXiv Detail & Related papers (2022-10-01T21:17:39Z)
Geometry Interaction Knowledge Graph Embeddings [153.69745042757066]
We propose Geometry Interaction knowledge graph Embeddings (GIE), which learns spatial structures interactively between the Euclidean, hyperbolic and hyperspherical spaces. Our proposed GIE can capture a richer set of relational information, model key inference patterns, and enable expressive semantic matching across entities.
arXiv Detail & Related papers (2022-06-24T08:33:43Z)
Towards Quantum Graph Neural Networks: An Ego-Graph Learning Approach [47.19265172105025]
We propose a novel hybrid quantum-classical algorithm for graph-structured data, which we refer to as the Ego-graph based Quantum Graph Neural Network (egoQGNN) egoQGNN implements the GNN theoretical framework using the tensor product and unity matrix representation, which greatly reduces the number of model parameters required. The architecture is based on a novel mapping from real-world data to Hilbert space.
arXiv Detail & Related papers (2022-01-13T16:35:45Z)
Inferring halo masses with Graph Neural Networks [0.5804487044220691]
We build a model that infers the mass of a halo given the positions, velocities, stellar masses, and radii of the galaxies it hosts. We use Graph Neural Networks (GNNs) that are designed to work with irregular and sparse data. Our model is able to constrain the masses of the halos with a $sim$0.2 dex accuracy.
arXiv Detail & Related papers (2021-11-16T18:37:53Z)
Satellite galaxy abundance dependency on cosmology in Magneticum simulations [101.18253437732933]
We build an emulator of satellite abundance based on cosmological parameters. We find that $A$ and $beta$ depend on cosmological parameters, even if weakly. We also show that satellite abundance cosmology dependency differs between full-physics (FP) simulations, dark-matter only (DMO) and non-radiative simulations.
arXiv Detail & Related papers (2021-10-11T18:00:02Z)
The CAMELS Multifield Dataset: Learning the Universe's Fundamental Parameters with Artificial Intelligence [13.72500304639404]
We present the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) Multifield dataset, CMD. CMD is a collection of hundreds of thousands of 2D maps and 3D grids containing many different properties of cosmic gas, dark matter, and stars from 2,000 distinct simulated universes at several cosmic times. The 2D maps and 3D grids represent cosmic regions that span $sim$100 million light years and have been generated from thousands of state-of-the-art hydrodynamic and gravity-only N-body simulations from the CAMELS project.
arXiv Detail & Related papers (2021-09-22T18:00:01Z)
First Full-Event Reconstruction from Imaging Atmospheric Cherenkov Telescope Real Data with Deep Learning [55.41644538483948]
The Cherenkov Telescope Array is the future of ground-based gamma-ray astronomy. Its first prototype telescope built on-site, the Large Size Telescope 1, is currently under commissioning and taking its first scientific data. We present for the first time the development of a full-event reconstruction based on deep convolutional neural networks and its application to real data.
arXiv Detail & Related papers (2021-05-31T12:51:42Z)
Fast and Accurate Non-Linear Predictions of Universes with Deep Learning [21.218297581239664]
We build a V-Net based model that transforms fast linear predictions into fully nonlinear predictions from numerical simulations. Our NN model learns to emulate the simulations down to small scales and is both faster and more accurate than the current state-of-the-art approximate methods.
arXiv Detail & Related papers (2020-12-01T03:30:37Z)
Emulation of cosmological mass maps with conditional generative adversarial networks [0.0]
We propose a novel conditional GAN model that is able to generate mass maps for any pair of matter density $Omega_m$ and matter clustering strength $sigma_8$. Our results show that our conditional GAN can interpolate efficiently within the space of simulated cosmologies. This contribution is a step towards building emulators of mass maps directly, capturing both the cosmological signal and its variability.
arXiv Detail & Related papers (2020-04-17T09:34:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.