Can AI Dream of Unseen Galaxies? Conditional Diffusion Model for Galaxy Morphology Augmentation
- URL: http://arxiv.org/abs/2506.16233v1
- Date: Thu, 19 Jun 2025 11:44:09 GMT
- Title: Can AI Dream of Unseen Galaxies? Conditional Diffusion Model for Galaxy Morphology Augmentation
- Authors: Chenrui Ma, Zechang Sun, Tao Jing, Zheng Cai, Yuan-Sen Ting, Song Huang, Mingyu Li,
- Abstract summary: We propose a conditional diffusion model to synthesize realistic galaxy images for augmenting machine learning data.<n>We show that our model generates diverse, high-fidelity galaxy images closely adhere to the specified morphological feature conditions.<n>This model enables generative extrapolation to project well-annotated data into unseen domains and advancing rare object detection.
- Score: 4.3933321767775135
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Observational astronomy relies on visual feature identification to detect critical astrophysical phenomena. While machine learning (ML) increasingly automates this process, models often struggle with generalization in large-scale surveys due to the limited representativeness of labeled datasets -- whether from simulations or human annotation -- a challenge pronounced for rare yet scientifically valuable objects. To address this, we propose a conditional diffusion model to synthesize realistic galaxy images for augmenting ML training data. Leveraging the Galaxy Zoo 2 dataset which contains visual feature -- galaxy image pairs from volunteer annotation, we demonstrate that our model generates diverse, high-fidelity galaxy images closely adhere to the specified morphological feature conditions. Moreover, this model enables generative extrapolation to project well-annotated data into unseen domains and advancing rare object detection. Integrating synthesized images into ML pipelines improves performance in standard morphology classification, boosting completeness and purity by up to 30\% across key metrics. For rare object detection, using early-type galaxies with prominent dust lane features ( $\sim$0.1\% in GZ2 dataset) as a test case, our approach doubled the number of detected instances from 352 to 872, compared to previous studies based on visual inspection. This study highlights the power of generative models to bridge gaps between scarce labeled data and the vast, uncharted parameter space of observational astronomy and sheds insight for future astrophysical foundation model developments. Our project homepage is available at https://galaxysd-webpage.streamlit.app/.
Related papers
- Category-based Galaxy Image Generation via Diffusion Models [0.39945675027960637]
We present GalCatDiff, the first framework in astronomy to leverage both galaxy image features and astrophysical properties in the network design of diffusion models.<n>GalCatDiff incorporates an enhanced U-Net and a novel block entitled Astro-RAB (Residual Attention Block), which dynamically combines attention mechanisms with convolution operations to ensure global consistency and local feature fidelity.<n>Our experimental results demonstrate that GalCatDiff significantly outperforms existing methods in terms of the consistency of sample color and size distributions, and the generated galaxies are both visually realistic and physically consistent.
arXiv Detail & Related papers (2025-06-19T12:14:33Z) - A Versatile Framework for Analyzing Galaxy Image Data by Implanting Human-in-the-loop on a Large Vision Model [14.609681101463334]
We present a framework for general analysis of galaxy images based on a large vision model (LVM) plus downstream tasks (DST)
Considering the low signal-to-noise ratio of galaxy images, we have incorporated a Human-in-the-loop (HITL) module into our large vision model.
For object detection, trained by 1000 data points, our DST upon the LVM achieves an accuracy of 96.7%, while ResNet50 plus Mask R-CNN gives an accuracy of 93.1%.
arXiv Detail & Related papers (2024-05-17T16:29:27Z) - InstaGen: Enhancing Object Detection by Training on Synthetic Dataset [59.445498550159755]
We present a novel paradigm to enhance the ability of object detector, e.g., expanding categories or improving detection performance.
We integrate an instance-level grounding head into a pre-trained, generative diffusion model, to augment it with the ability of localising instances in the generated images.
We conduct thorough experiments to show that, this enhanced version of diffusion model, termed as InstaGen, can serve as a data synthesizer.
arXiv Detail & Related papers (2024-02-08T18:59:53Z) - GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge Learning [50.7702397913573]
The rapid advancement of photorealistic generators has reached a critical juncture where the discrepancy between authentic and manipulated images is increasingly indistinguishable.
Although there have been a number of publicly available face forgery datasets, the forgery faces are mostly generated using GAN-based synthesis technology.
We propose a large-scale, diverse, and fine-grained high-fidelity dataset, namely GenFace, to facilitate the advancement of deepfake detection.
arXiv Detail & Related papers (2024-02-03T03:13:50Z) - Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL)
Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images.
Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z) - Discovering Galaxy Features via Dataset Distillation [7.121183597915665]
In many applications, Neural Nets (NNs) have classification performance on par or even exceeding human capacity.
Here, we apply this idea to the notoriously difficult task of galaxy classification.
We present a novel way to summarize and visualize prototypical galaxy morphology through the lens of neural networks.
arXiv Detail & Related papers (2023-11-29T12:39:31Z) - Semi-Supervised Domain Adaptation for Cross-Survey Galaxy Morphology
Classification and Anomaly Detection [57.85347204640585]
We develop a Universal Domain Adaptation method DeepAstroUDA.
It can be applied to datasets with different types of class overlap.
For the first time, we demonstrate the successful use of domain adaptation on two very different observational datasets.
arXiv Detail & Related papers (2022-11-01T18:07:21Z) - Realistic galaxy image simulation via score-based generative models [0.0]
We show that a score-based generative model can be used to produce realistic yet fake images that mimic observations of galaxies.
Subjectively, the generated galaxies are highly realistic when compared with samples from the real dataset.
arXiv Detail & Related papers (2021-11-02T16:27:08Z) - DeepShadows: Separating Low Surface Brightness Galaxies from Artifacts
using Deep Learning [70.80563014913676]
We investigate the use of convolutional neural networks (CNNs) for the problem of separating low-surface-brightness galaxies from artifacts in survey images.
We show that CNNs offer a very promising path in the quest to study the low-surface-brightness universe.
arXiv Detail & Related papers (2020-11-24T22:51:08Z) - DecAug: Augmenting HOI Detection via Decomposition [54.65572599920679]
Current algorithms suffer from insufficient training samples and category imbalance within datasets.
We propose an efficient and effective data augmentation method called DecAug for HOI detection.
Experiments show that our method brings up to 3.3 mAP and 1.6 mAP improvements on V-COCO and HICODET dataset.
arXiv Detail & Related papers (2020-10-02T13:59:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.