Related papers: Self-supervised Synthetic Pretraining for Inference of Stellar Mass Embedded in Dense Gas

Self-supervised Synthetic Pretraining for Inference of Stellar Mass Embedded in Dense Gas

URL: http://arxiv.org/abs/2510.24159v1
Date: Tue, 28 Oct 2025 07:55:34 GMT
Title: Self-supervised Synthetic Pretraining for Inference of Stellar Mass Embedded in Dense Gas
Authors: Keiya Hirashima, Shingo Nozaki, Naoto Harada,
Abstract summary: Supervised machine learning could link complex structures to stellar mass, but it requires large, high-quality labeled datasets from high-resolution simulations.<n>We address this by pretraining a vision transformer on one million synthetic fractal images using the self-supervised framework DINOv2.<n>Our results demonstrate that synthetic pretraining improves frozen-feature regression stellar mass predictions, with the pretrained model performing slightly better than a supervised model trained on the same limited simulations.
Score: 0.1753733541634709
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Stellar mass is a fundamental quantity that determines the properties and evolution of stars. However, estimating stellar masses in star-forming regions is challenging because young stars are obscured by dense gas and the regions are highly inhomogeneous, making spherical dynamical estimates unreliable. Supervised machine learning could link such complex structures to stellar mass, but it requires large, high-quality labeled datasets from high-resolution magneto-hydrodynamical (MHD) simulations, which are computationally expensive. We address this by pretraining a vision transformer on one million synthetic fractal images using the self-supervised framework DINOv2, and then applying the frozen model to limited high-resolution MHD simulations. Our results demonstrate that synthetic pretraining improves frozen-feature regression stellar mass predictions, with the pretrained model performing slightly better than a supervised model trained on the same limited simulations. Principal component analysis of the extracted features further reveals semantically meaningful structures, suggesting that the model enables unsupervised segmentation of star-forming regions without the need for labeled data or fine-tuning.

Related papers

Scalable Spatio-Temporal SE(3) Diffusion for Long-Horizon Protein Dynamics [51.85385061275941]
Molecular dynamics (MD) simulations remain the gold standard for studying protein dynamics.<n>Recent generative models have shown promise in accelerating simulations, yet they struggle with long-horizon generation.<n>We present STAR-MD, a scalable diffusion model that generates physically plausible protein trajectories over micro-scale timescales.
arXiv Detail & Related papers (2026-02-02T14:13:28Z)
SIGMA: Scalable Spectral Insights for LLM Collapse [51.863164847253366]
We introduce SIGMA (Spectral Inequalities for Gram Matrix Analysis), a unified framework for model collapse.<n>By utilizing benchmarks that deriving and deterministic bounds on the matrix's spectrum, SIGMA provides a mathematically grounded metric to track the contraction of the representation space.<n>We demonstrate that SIGMA effectively captures the transition towards states, offering both theoretical insights into the mechanics of collapse.
arXiv Detail & Related papers (2026-01-06T19:47:11Z)
UniSH: Unifying Scene and Human Reconstruction in a Feed-Forward Pass [83.7071371474926]
UniSH is a unified, feed-forward framework for joint metric-scale 3D scene and human reconstruction.<n>Our framework bridges strong, disparate priors from scene reconstruction and HMR.<n>Our model achieves state-of-the-art performance on human-centric scene reconstruction.
arXiv Detail & Related papers (2026-01-03T16:06:27Z)
Score Matching on Large Geometric Graphs for Cosmology Generation [14.637236070358588]
We introduce a score-based generative model with an equivariant graph neural network that simulates gravitational clustering of galaxies across cosmologies.<n>The proposed equivariant score-based model successfully generates full-scale cosmological point clouds of up to 600,000 halos.<n>This work advances by introducing a generative model designed to closely resemble the underlying gravitational clustering of structure formation.
arXiv Detail & Related papers (2025-08-23T11:08:06Z)
STAR: A Benchmark for Astronomical Star Fields Super-Resolution [52.895107920663236]
We propose STAR, a large-scale astronomical SR dataset containing 54,738 flux-consistent star field image pairs.<n>We propose a Flux-Invariant Super Resolution (FISR) model that could accurately infer the flux-consistent high-resolution images from input photometry.
arXiv Detail & Related papers (2025-07-22T09:28:28Z)
A COMPASS to Model Comparison and Simulation-Based Inference in Galactic Chemical Evolution [0.0]
We present a novel simulation-based inference framework that combines score-based diffusion models with transformer architectures.<n>Our results demonstrate that modern SBI methods can robustly constrain uncertain physics in astrophysical simulators.
arXiv Detail & Related papers (2025-07-07T14:45:41Z)
UniGenX: a unified generative foundation model that couples sequence, structure and function to accelerate scientific design across proteins, molecules and materials [62.72989417755985]
We present UniGenX, a unified generative model for function in natural systems.<n>UniGenX represents heterogeneous inputs as a mixed stream of symbolic and numeric tokens.<n>It achieves state-of-the-art or competitive performance for the function-aware generation across domains.
arXiv Detail & Related papers (2025-03-09T16:43:07Z)
GausSim: Foreseeing Reality by Gaussian Simulator for Elastic Objects [55.02281855589641]
GausSim is a novel neural network-based simulator designed to capture the dynamic behaviors of real-world elastic objects represented through Gaussian kernels.<n>We leverage continuum mechanics and treat each kernel as a Center of Mass System (CMS) that represents continuous piece of matter.<n>In addition, GausSim incorporates explicit physics constraints, such as mass and momentum conservation, ensuring interpretable results and robust, physically plausible simulations.
arXiv Detail & Related papers (2024-12-23T18:58:17Z)
ASURA-FDPS-ML: Star-by-star Galaxy Simulations Accelerated by Surrogate Modeling for Supernova Feedback [0.7324709841516586]
We introduce new high-resolution galaxy simulations accelerated by a surrogate model that reduces the computation cost by approximately 75 percent.<n>Massive stars with a Zero Age Main Sequence mass of more than about 10 $mathrmM_odot$ explode as core-collapse supernovae (CCSNe)<n>Our new approach achieves high-resolution fidelity while reducing computational costs, effectively bridging the physical scale gap and enabling multi-scale simulations.
arXiv Detail & Related papers (2024-10-30T18:00:02Z)
MambaDS: Near-Surface Meteorological Field Downscaling with Topography Constrained Selective State Space Modeling [68.69647625472464]
Downscaling, a crucial task in meteorological forecasting, enables the reconstruction of high-resolution meteorological states for target regions. Previous downscaling methods lacked tailored designs for meteorology and encountered structural limitations. We propose a novel model called MambaDS, which enhances the utilization of multivariable correlations and topography information.
arXiv Detail & Related papers (2024-08-20T13:45:49Z)
A conditional latent autoregressive recurrent model for generation and forecasting of beam dynamics in particle accelerators [46.348283638884425]
We propose a two-step unsupervised deep learning framework named as Latent Autoregressive Recurrent Model (CLARM) for learning dynamics of charged particles in accelerators. The CLARM can generate projections at various accelerator sampling modules by capturing and decoding the latent space representation. The results demonstrate that the generative and forecasting ability of the proposed approach is promising when tested against a variety of evaluation metrics.
arXiv Detail & Related papers (2024-03-19T22:05:17Z)
Predicting Localized Primordial Star Formation with Deep Convolutional Neural Networks [0.0]
We investigate applying 3D deep convolutional neural networks as fast surrogate models of the formation and feedback effects of primordial stars. We present the surrogate model to predict localized primordial star formation; the feedback model will be presented in a subsequent paper. To our knowledge, this is the first model that can predict primordial star forming regions that match highly-resolved cosmological simulations.
arXiv Detail & Related papers (2020-11-02T22:32:27Z)
Embedded-physics machine learning for coarse-graining and collective variable discovery without data [3.222802562733787]
We present a novel learning framework that consistently embeds underlying physics. We propose a novel objective based on reverse Kullback-Leibler divergence that fully incorporates the available physics in the form of the atomistic force field. We demonstrate the algorithmic advances in terms of predictive ability and the physical meaning of the revealed CVs for a bimodal potential energy function and the alanine dipeptide.
arXiv Detail & Related papers (2020-02-24T10:28:41Z)
Learning to Simulate Complex Physics with Graph Networks [68.43901833812448]
We present a machine learning framework and model implementation that can learn to simulate a wide variety of challenging physical domains. Our framework---which we term "Graph Network-based Simulators" (GNS)--represents the state of a physical system with particles, expressed as nodes in a graph, and computes dynamics via learned message-passing. Our results show that our model can generalize from single-timestep predictions with thousands of particles during training, to different initial conditions, thousands of timesteps, and at least an order of magnitude more particles at test time.
arXiv Detail & Related papers (2020-02-21T16:44:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.