The CAMELS Multifield Dataset: Learning the Universe's Fundamental
Parameters with Artificial Intelligence
- URL: http://arxiv.org/abs/2109.10915v1
- Date: Wed, 22 Sep 2021 18:00:01 GMT
- Title: The CAMELS Multifield Dataset: Learning the Universe's Fundamental
Parameters with Artificial Intelligence
- Authors: Francisco Villaescusa-Navarro, Shy Genel, Daniel Angles-Alcazar,
Leander Thiele, Romeel Dave, Desika Narayanan, Andrina Nicola, Yin Li, Pablo
Villanueva-Domingo, Benjamin Wandelt, David N. Spergel, Rachel S. Somerville,
Jose Manuel Zorrilla Matilla, Faizan G. Mohammad, Sultan Hassan, Helen Shao,
Digvijay Wadekar, Michael Eickenberg, Kaze W.K. Wong, Gabriella Contardo,
Yongseok Jo, Emily Moser, Erwin T. Lau, Luis Fernando Machado Poletti Valle,
Lucia A. Perez, Daisuke Nagai, Nicholas Battaglia, Mark Vogelsberger
- Abstract summary: We present the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) Multifield dataset, CMD.
CMD is a collection of hundreds of thousands of 2D maps and 3D grids containing many different properties of cosmic gas, dark matter, and stars from 2,000 distinct simulated universes at several cosmic times.
The 2D maps and 3D grids represent cosmic regions that span $sim$100 million light years and have been generated from thousands of state-of-the-art hydrodynamic and gravity-only N-body simulations from the CAMELS project.
- Score: 13.72500304639404
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present the Cosmology and Astrophysics with MachinE Learning Simulations
(CAMELS) Multifield Dataset, CMD, a collection of hundreds of thousands of 2D
maps and 3D grids containing many different properties of cosmic gas, dark
matter, and stars from 2,000 distinct simulated universes at several cosmic
times. The 2D maps and 3D grids represent cosmic regions that span $\sim$100
million light years and have been generated from thousands of state-of-the-art
hydrodynamic and gravity-only N-body simulations from the CAMELS project.
Designed to train machine learning models, CMD is the largest dataset of its
kind containing more than 70 Terabytes of data. In this paper we describe CMD
in detail and outline a few of its applications. We focus our attention on one
such task, parameter inference, formulating the problems we face as a challenge
to the community. We release all data and provide further technical details at
https://camels-multifield-dataset.readthedocs.io.
Related papers
- How DREAMS are made: Emulating Satellite Galaxy and Subhalo Populations with Diffusion Models and Point Clouds [1.203672395409347]
We present NeHOD, a generative framework based on variational diffusion model and Transformer, for painting galaxies/subhalos on top of dark matter halos.
For each halo, NeHOD predicts the positions, velocities, masses, and concentrations of its central and satellite galaxies.
We show that our model captures the complex relationships between subhalo properties as a function of the simulation parameters.
arXiv Detail & Related papers (2024-09-04T18:00:00Z) - Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild [66.34146236875822]
The Nymeria dataset is a large-scale, diverse, richly annotated human motion dataset collected in the wild with multiple multimodal egocentric devices.
It contains 1200 recordings of 300 hours of daily activities from 264 participants across 50 locations, travelling a total of 399Km.
The motion-language descriptions provide 310.5K sentences in 8.64M words from a vocabulary size of 6545.
arXiv Detail & Related papers (2024-06-14T10:23:53Z) - DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity
Human-centric Rendering [126.00165445599764]
We present DNA-Rendering, a large-scale, high-fidelity repository of human performance data for neural actor rendering.
Our dataset contains over 1500 human subjects, 5000 motion sequences, and 67.5M frames' data volume.
We construct a professional multi-view system to capture data, which contains 60 synchronous cameras with max 4096 x 3000 resolution, 15 fps speed, and stern camera calibration steps.
arXiv Detail & Related papers (2023-07-19T17:58:03Z) - Objaverse-XL: A Universe of 10M+ 3D Objects [58.02773375519506]
We present averse-XL, a dataset of over 10 million 3D objects.
We show that by training Zero123 on novel view, utilizing over 100 million multi-view rendered images, we achieve strong zero-shot generalization abilities.
arXiv Detail & Related papers (2023-07-11T17:57:40Z) - Argoverse 2: Next Generation Datasets for Self-Driving Perception and
Forecasting [64.7364925689825]
Argoverse 2 (AV2) is a collection of three datasets for perception and forecasting research in the self-driving domain.
The Lidar dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose.
The Motion Forecasting dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene.
arXiv Detail & Related papers (2023-01-02T00:36:22Z) - The CAMELS project: public data release [12.073735170721717]
The Cosmology and Astrophysics with MachinE Learning Simulations project was developed to combine cosmology with astrophysics.
CAMELS contains 4,233 cosmological simulations, 2,049 N-body and 2,184 state-of-the-art hydrodynamic simulations.
We release all the data, comprising more than 350 terabytes and containing 143,922 snapshots, millions of halos, galaxies and summary statistics.
arXiv Detail & Related papers (2022-01-04T19:00:00Z) - Inferring halo masses with Graph Neural Networks [0.5804487044220691]
We build a model that infers the mass of a halo given the positions, velocities, stellar masses, and radii of the galaxies it hosts.
We use Graph Neural Networks (GNNs) that are designed to work with irregular and sparse data.
Our model is able to constrain the masses of the halos with a $sim$0.2 dex accuracy.
arXiv Detail & Related papers (2021-11-16T18:37:53Z) - Multifield Cosmology with Artificial Intelligence [13.031414468952313]
Astrophysical processes modify the properties of dark matter, gas, and galaxies in a poorly understood way.
We generate hundreds of thousands of 2-dimensional maps for 13 different fields.
We use these maps to train convolutional neural networks to extract the maximum amount of cosmological information.
arXiv Detail & Related papers (2021-09-20T18:00:01Z) - Megaverse: Simulating Embodied Agents at One Million Experiences per
Second [75.1191260838366]
We present Megaverse, a new 3D simulation platform for reinforcement learning and embodied AI research.
Megaverse is up to 70x faster than DeepMind Lab in fully-shaded 3D scenes with interactive objects.
We use Megaverse to build a new benchmark that consists of several single-agent and multi-agent tasks.
arXiv Detail & Related papers (2021-07-17T03:16:25Z) - A Spacecraft Dataset for Detection, Segmentation and Parts Recognition [42.27081423489484]
In this paper, we release a dataset for spacecraft detection, instance segmentation and part recognition.
The main contribution of this work is the development of the dataset using images of space stations and satellites.
We also provide evaluations with state-of-the-art methods in object detection and instance segmentation as a benchmark for the dataset.
arXiv Detail & Related papers (2021-06-15T14:36:56Z) - Sketch and Scale: Geo-distributed tSNE and UMAP [75.44887265789056]
Running machine learning analytics over geographically distributed datasets is a rapidly arising problem.
We introduce a novel framework: Sketch and Scale (SnS)
It leverages a Count Sketch data structure to compress the data on the edge nodes, aggregates the reduced size sketches on the master node, and runs vanilla tSNE or UMAP on the summary.
We show this technique to be fully parallel, scale linearly in time, logarithmically in memory, and communication, making it possible to analyze datasets with many millions, potentially billions of data points, spread across several data centers around the globe.
arXiv Detail & Related papers (2020-11-11T22:32:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.