GalaxiesML: a dataset of galaxy images, photometry, redshifts, and structural parameters for machine learning
- URL: http://arxiv.org/abs/2410.00271v1
- Date: Mon, 30 Sep 2024 22:46:44 GMT
- Title: GalaxiesML: a dataset of galaxy images, photometry, redshifts, and structural parameters for machine learning
- Authors: Tuan Do, Bernie Boscoe, Evan Jones, Yun Qi Li, Kevin Alfaro,
- Abstract summary: We present a dataset built for machine learning applications consisting of galaxy photometry, images, spectroscopic redshifts, and structural properties.
This dataset comprises 286,401 galaxy images and photometry from the Hyper-Suprime-Cam Survey PDR2 in five imaging filters.
We make this dataset public to help spur development of machine learning methods for the next generation of surveys such as Euclid and LSST.
- Score: 1.0279580671257864
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a dataset built for machine learning applications consisting of galaxy photometry, images, spectroscopic redshifts, and structural properties. This dataset comprises 286,401 galaxy images and photometry from the Hyper-Suprime-Cam Survey PDR2 in five imaging filters ($g,r,i,z,y$) with spectroscopically confirmed redshifts as ground truth. Such a dataset is important for machine learning applications because it is uniform, consistent, and has minimal outliers but still contains a realistic range of signal-to-noise ratios. We make this dataset public to help spur development of machine learning methods for the next generation of surveys such as Euclid and LSST. The aim of GalaxiesML is to provide a robust dataset that can be used not only for astrophysics but also for machine learning, where image properties cannot be validated by the human eye and are instead governed by physical laws. We describe the challenges associated with putting together a dataset from publicly available archives, including outlier rejection, duplication, establishing ground truths, and sample selection. This is one of the largest public machine learning-ready training sets of its kind with redshifts ranging from 0.01 to 4. The redshift distribution of this sample peaks at redshift of 1.5 and falls off rapidly beyond redshift 2.5. We also include an example application of this dataset for redshift estimation, demonstrating that using images for redshift estimation produces more accurate results compared to using photometry alone. For example, the bias in redshift estimate is a factor of 10 lower when using images between redshift of 0.1 to 1.25 compared to photometry alone. Results from dataset such as this will help inform us on how to best make use of data from the next generation of galaxy surveys.
Related papers
- Mantis Shrimp: Exploring Photometric Band Utilization in Computer Vision Networks for Photometric Redshift Estimation [0.30924355683504173]
We present a model for photometric redshift estimation that fuses ultra-violet (GALEX), optical (PanSTARRS), and infrared (UnWISE) imagery.
Mantis Shrimp estimates the conditional density estimate of redshift using cutout images.
We study how the models learn to use information across bands, finding evidence that our models successfully incorporates information from all surveys.
arXiv Detail & Related papers (2025-01-15T19:46:23Z) - Determination of galaxy photometric redshifts using Conditional Generative Adversarial Networks (CGANs) [0.0]
We present a new algorithmic approach for determining photometric redshifts of galaxies using Conditional Generative Adversarial Networks (CGANs)
Proposed CGAN implementation, approaches photometric redshift determination as a probabilistic regression, where instead of determining a single value for the estimated redshift of the galaxy, a full probability density is computed.
arXiv Detail & Related papers (2025-01-11T12:42:07Z) - Using different sources of ground truths and transfer learning to improve the generalization of photometric redshift estimation [0.0]
We explore methods to improve galaxy redshift predictions by combining different ground truths.
We first train a base neural network on TransferZ and then refine it using transfer learning on a dataset of galaxies with more precise spectroscopic redshifts (GalaxiesML)
Both methods reduce bias by $sim$ 5x, RMS error by $sim$ 1.5x, and catastrophic outlier rates by 1.3x on GalaxiesML, compared to a baseline trained only on TransferZ data.
arXiv Detail & Related papers (2024-11-27T04:55:37Z) - XAMI -- A Benchmark Dataset for Artefact Detection in XMM-Newton Optical Images [0.0]
We present a dataset of images from the XMM-Newton space telescope Optical Monitoring camera showing different types of artefacts.
We hand-annotated a sample of 1000 images with artefacts which we use to train automated ML methods.
We adopt a hybrid approach, combining knowledge from both convolutional neural networks (CNNs) and transformer-based models.
arXiv Detail & Related papers (2024-06-25T07:14:15Z) - SIRST-5K: Exploring Massive Negatives Synthesis with Self-supervised
Learning for Robust Infrared Small Target Detection [53.19618419772467]
Single-frame infrared small target (SIRST) detection aims to recognize small targets from clutter backgrounds.
With the development of Transformer, the scale of SIRST models is constantly increasing.
With a rich diversity of infrared small target data, our algorithm significantly improves the model performance and convergence speed.
arXiv Detail & Related papers (2024-03-08T16:14:54Z) - AstroCLIP: A Cross-Modal Foundation Model for Galaxies [40.43521617393482]
AstroCLIP embeds galaxy images and spectra separately by pretraining separate transformer-based image and spectrum encoders in self-supervised settings.
We find remarkable performance on all downstream tasks, even relative to supervised baselines.
Our approach represents the first cross-modal self-supervised model for galaxies, and the first self-supervised transformer-based architectures for galaxy images and spectra.
arXiv Detail & Related papers (2023-10-04T17:59:38Z) - Cosmology from Galaxy Redshift Surveys with PointNet [65.89809800010927]
In cosmology, galaxy redshift surveys resemble such a permutation invariant collection of positions in space.
We employ a textitPointNet-like neural network to regress the values of the cosmological parameters directly from point cloud data.
Our implementation of PointNets can analyse inputs of $mathcalO(104) - mathcalO(105)$ galaxies at a time, which improves upon earlier work for this application by roughly two orders of magnitude.
arXiv Detail & Related papers (2022-11-22T15:35:05Z) - CroCo: Cross-Modal Contrastive learning for localization of Earth
Observation data [62.96337162094726]
It is of interest to localize a ground-based LiDAR point cloud on remote sensing imagery.
We propose a contrastive learning-based method that trains on DEM and high-resolution optical imagery.
In the best scenario, the Top-1 score of 0.71 and Top-5 score of 0.81 are obtained.
arXiv Detail & Related papers (2022-04-14T15:55:00Z) - RGB-D Saliency Detection via Cascaded Mutual Information Minimization [122.8879596830581]
Existing RGB-D saliency detection models do not explicitly encourage RGB and depth to achieve effective multi-modal learning.
We introduce a novel multi-stage cascaded learning framework via mutual information minimization to "explicitly" model the multi-modal information between RGB image and depth data.
arXiv Detail & Related papers (2021-09-15T12:31:27Z) - Self-Supervised Representation Learning for RGB-D Salient Object
Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation.
Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts.
For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z) - DeepShadows: Separating Low Surface Brightness Galaxies from Artifacts
using Deep Learning [70.80563014913676]
We investigate the use of convolutional neural networks (CNNs) for the problem of separating low-surface-brightness galaxies from artifacts in survey images.
We show that CNNs offer a very promising path in the quest to study the low-surface-brightness universe.
arXiv Detail & Related papers (2020-11-24T22:51:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.