Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding
- URL: http://arxiv.org/abs/2503.18578v2
- Date: Wed, 30 Apr 2025 06:48:28 GMT
- Title: Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding
- Authors: Tianyu Chen, Xingcheng Fu, Yisen Gao, Haodong Qian, Yuecen Wei, Kun Yan, Haoyi Zhou, Jianxin Li,
- Abstract summary: We introduce Galaxy-Walker, a geometry-aware vision-language model for universe-level vision understanding tasks.<n> Galaxy-Walker achieves state-of-the-art performance in both galaxy property estimation and morphology classification tasks.
- Score: 19.49455523407794
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern vision-language models (VLMs) develop patch embedding and convolution backbone within vector space, especially Euclidean ones, at the very founding. When expanding VLMs to a galaxy scale for understanding astronomical phenomena, the integration of spherical space for planetary orbits and hyperbolic spaces for black holes raises two formidable challenges. a) The current pre-training model is confined to Euclidean space rather than a comprehensive geometric embedding. b) The predominant architecture lacks suitable backbones for anisotropic physical geometries. In this paper, we introduced Galaxy-Walker, a geometry-aware VLM, for the universe-level vision understanding tasks. We proposed the geometry prompt that generates geometry tokens by random walks across diverse spaces on a multi-scale physical graph, along with a geometry adapter that compresses and reshapes the space anisotropy in a mixture-of-experts manner. Extensive experiments demonstrate the effectiveness of our approach, with Galaxy-Walker achieving state-of-the-art performance in both galaxy property estimation ($R^2$ scores up to $0.91$) and morphology classification tasks (up to $+0.17$ F1 improvement in challenging features), significantly outperforming both domain-specific models and general-purpose VLMs.
Related papers
- HoLa: B-Rep Generation using a Holistic Latent Representation [51.07878285790399]
We introduce a novel representation for learning and generating Computer-Aided Design (CAD) models in the form of $textitboundary representations$ (B-Reps)
Our representation unifies the continuous geometric properties of B-Rep primitives in different orders.
Our method significantly reduces ambiguities, redundancies, and incoherences among the generated B-Rep primitives.
arXiv Detail & Related papers (2025-04-19T10:34:24Z) - GS-SDF: LiDAR-Augmented Gaussian Splatting and Neural SDF for Geometrically Consistent Rendering and Reconstruction [12.293953058837653]
We propose a unified LiDAR-visual system that synergizes Gaussian splatting with a neural signed distance field.<n>Experiments demonstrate superior reconstruction accuracy and rendering quality across diverse trajectories.
arXiv Detail & Related papers (2025-03-13T08:53:38Z) - Effective Fine-Tuning of Vision-Language Models for Accurate Galaxy Morphology Analysis [3.379005517804234]
GalaxAlign is a novel method that fine-tunes pre-trained foundation models to achieve high accuracy on astronomical tasks.<n>Our method extends a contrastive learning architecture to align three types of data in fine-tuning.
arXiv Detail & Related papers (2024-11-29T05:10:47Z) - Geometric deep learning for galaxy-halo connection: a case study for galaxy intrinsic alignments [1.2231689895452238]
We propose a Deep Generative Model trained on the IllustrisTNG-100 simulation to sample 3D galaxy shapes and orientations.
The model is able to learn and predict features such as galaxy orientations that are statistically consistent with the reference simulation.
arXiv Detail & Related papers (2024-09-27T13:55:10Z) - GeoMFormer: A General Architecture for Geometric Molecular Representation Learning [84.02083170392764]
We introduce a novel Transformer-based molecular model called GeoMFormer to achieve this goal.
We show that GeoMFormer achieves strong performance on both invariant and equivariant tasks of different types and scales.
arXiv Detail & Related papers (2024-06-24T17:58:13Z) - Understanding and Mitigating Hyperbolic Dimensional Collapse in Graph Contrastive Learning [70.0681902472251]
We propose a novel contrastive learning framework to learn high-quality graph embeddings in hyperbolic space.<n>Specifically, we design the alignment metric that effectively captures the hierarchical data-invariant information.<n>We show that in the hyperbolic space one has to address the leaf- and height-level uniformity related to properties of trees.
arXiv Detail & Related papers (2023-10-27T15:31:42Z) - Dynamic Hyperbolic Attention Network for Fine Hand-object Reconstruction [76.5549647815413]
We propose the first precise hand-object reconstruction method in hyperbolic space, namely Dynamic Hyperbolic Attention Network (DHANet)
Our method learns mesh features with rich geometry-image multi-modal information and models better hand-object interaction.
arXiv Detail & Related papers (2023-09-06T13:00:10Z) - Knowledge-based Multiple Adaptive Spaces Fusion for Recommendation [35.20583774988951]
We propose a knowledge-based multiple adaptive spaces fusion method for recommendation, namely MCKG.
Unlike existing methods that solely adopt a specific manifold, we introduce the unified space that is compatible with hyperbolic, euclidean and spherical spaces.
In addition, we propose a geometry-aware optimization strategy which enables the pull and push processes benefited from both hyperbolic and spherical spaces.
arXiv Detail & Related papers (2023-08-29T12:11:16Z) - Multi-scale Geometry-aware Transformer for 3D Point Cloud Classification [17.836838702265332]
This paper proposes a self-attention plug-in module with its variants, Multi-scale Geometry-aware Transformer (MGT)
MGT processes point cloud data with multi-scale local and global geometric information in the following three aspects.
Experimental results demonstrate that the MGT vastly increases the capability of capturing multi-scale geometry using the self-attention mechanism.
arXiv Detail & Related papers (2023-04-12T08:34:56Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - Geometry Interaction Knowledge Graph Embeddings [153.69745042757066]
We propose Geometry Interaction knowledge graph Embeddings (GIE), which learns spatial structures interactively between the Euclidean, hyperbolic and hyperspherical spaces.
Our proposed GIE can capture a richer set of relational information, model key inference patterns, and enable expressive semantic matching across entities.
arXiv Detail & Related papers (2022-06-24T08:33:43Z) - Concentric Spherical GNN for 3D Representation Learning [53.45704095146161]
We propose a novel multi-resolution convolutional architecture for learning over concentric spherical feature maps.
Our hierarchical architecture is based on alternatively learning to incorporate both intra-sphere and inter-sphere information.
We demonstrate the effectiveness of our approach in improving state-of-the-art performance on 3D classification tasks with rotated data.
arXiv Detail & Related papers (2021-03-18T19:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.