SGFormer: Satellite-Ground Fusion for 3D Semantic Scene Completion
- URL: http://arxiv.org/abs/2503.16825v2
- Date: Thu, 10 Apr 2025 08:47:41 GMT
- Title: SGFormer: Satellite-Ground Fusion for 3D Semantic Scene Completion
- Authors: Xiyue Guo, Jiarui Hu, Junjie Hu, Hujun Bao, Guofeng Zhang,
- Abstract summary: This paper presents the first satellite-ground cooperative SSC framework, i.e., SGFormer.<n>We propose a dual-branch architecture that encodes satellite and ground views in parallel, unifying them into a common domain.<n>We develop an adaptive weighting strategy that balances contributions from satellite and ground views.
- Score: 38.85690940616852
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, camera-based solutions have been extensively explored for scene semantic completion (SSC). Despite their success in visible areas, existing methods struggle to capture complete scene semantics due to frequent visual occlusions. To address this limitation, this paper presents the first satellite-ground cooperative SSC framework, i.e., SGFormer, exploring the potential of satellite-ground image pairs in the SSC task. Specifically, we propose a dual-branch architecture that encodes orthogonal satellite and ground views in parallel, unifying them into a common domain. Additionally, we design a ground-view guidance strategy that corrects satellite image biases during feature encoding, addressing misalignment between satellite and ground views. Moreover, we develop an adaptive weighting strategy that balances contributions from satellite and ground views. Experiments demonstrate that SGFormer outperforms the state of the art on SemanticKITTI and SSCBench-KITTI-360 datasets. Our code is available on https://github.com/gxytcrc/SGFormer.
Related papers
- EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation [50.433911327489554]
We introduce EarthMapper, a novel framework for controllable satellite-map translation.
We also contribute CNSatMap, a large-scale dataset comprising 302,132 precisely aligned satellite-map pairs across 38 Chinese cities.
experiments on CNSatMap and the New York dataset demonstrate EarthMapper's superior performance.
arXiv Detail & Related papers (2025-04-28T02:41:12Z) - Satellite to GroundScape -- Large-scale Consistent Ground View Generation from Satellite Views [5.146618378243241]
We propose a novel cross-view synthesis approach designed to ensure consistency across ground-view images generated from satellite views.
Our method, based on a fixed latent diffusion model, introduces two conditioning modules: satellite-guided denoising and satellite-temporal denoising.
We contribute a large-scale satellite-ground dataset containing over 100,000 perspective pairs to facilitate extensive ground scene or video generation.
arXiv Detail & Related papers (2025-04-22T10:58:42Z) - SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World [19.190830406660826]
We propose SA-Occ, the first Satellite-Assisted 3D occupancy prediction model.<n>It integrates historical yet readily available satellite imagery into real-time applications.<n>It achieves state-of-the-art performance, especially among single-frame methods.
arXiv Detail & Related papers (2025-03-20T17:54:29Z) - Weakly-supervised Camera Localization by Ground-to-satellite Image Registration [52.54992898069471]
We propose a weakly supervised learning strategy for ground-to-satellite image registration.
It derives positive and negative satellite images for each ground image.
We also propose a self-supervision strategy for cross-view image relative rotation estimation.
arXiv Detail & Related papers (2024-09-10T12:57:16Z) - Reconstructing Satellites in 3D from Amateur Telescope Images [44.20773507571372]
We propose a novel computational imaging framework that overcomes obstacles by integrating a hybrid image pre-processing pipeline.
We validate our approach on both synthetic satellite datasets and on-sky observations of China's Tiangong Space Station and the International Space Station.
Our framework enables high-fidelity 3D satellite monitoring from Earth, offering a cost-effective alternative for space situational awareness.
arXiv Detail & Related papers (2024-04-29T03:13:09Z) - A Semantic Segmentation-guided Approach for Ground-to-Aerial Image Matching [30.324252605889356]
This work addresses the problem of matching a query ground-view image with the corresponding satellite image without GPS data.
This is done by comparing the features from a ground-view image and a satellite one, innovatively leveraging the corresponding latter's segmentation mask through a three-stream Siamese-like network.
The novelty lies in the fusion of satellite images in combination with their semantic segmentation masks, aimed at ensuring that the model can extract useful features and focus on the significant parts of the images.
arXiv Detail & Related papers (2024-04-17T12:13:18Z) - SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving [87.8761593366609]
SSCBench is a benchmark that integrates scenes from widely used automotive datasets.
We benchmark models using monocular, trinocular, and cloud input to assess the performance gap.
We have unified semantic labels across diverse datasets to simplify cross-domain generalization testing.
arXiv Detail & Related papers (2023-06-15T09:56:33Z) - Sat2Density: Faithful Density Learning from Satellite-Ground Image Pairs [32.4349978810128]
This paper aims to develop an accurate 3D geometry representation of satellite images using satellite-ground image pairs.
We draw inspiration from the density field representation used in volumetric neural rendering and propose a new approach, called Sat2Density.
Our method utilizes the properties of ground-view panoramas for the sky and non-sky regions to learn faithful density fields of 3D scenes in a geometric perspective.
arXiv Detail & Related papers (2023-03-26T10:15:33Z) - CVLNet: Cross-View Semantic Correspondence Learning for Video-based
Camera Localization [89.69214577915959]
This paper tackles the problem of Cross-view Video-based camera localization.
We propose estimating the query camera's relative displacement to a satellite image before similarity matching.
Experiments have demonstrated the effectiveness of video-based localization over single image-based localization.
arXiv Detail & Related papers (2022-08-07T07:35:17Z) - Satellite Image Based Cross-view Localization for Autonomous Vehicle [59.72040418584396]
This paper shows that by using an off-the-shelf high-definition satellite image as a ready-to-use map, we are able to achieve cross-view vehicle localization up to a satisfactory accuracy.
Our method is validated on KITTI and Ford Multi-AV Seasonal datasets as ground view and Google Maps as the satellite view.
arXiv Detail & Related papers (2022-07-27T13:16:39Z) - Coming Down to Earth: Satellite-to-Street View Synthesis for
Geo-Localization [9.333087475006003]
Cross-view image based geo-localization is notoriously challenging due to drastic viewpoint and appearance differences between the two domains.
We show that we can address this discrepancy explicitly by learning to synthesize realistic street views from satellite inputs.
We propose a novel multi-task architecture in which image synthesis and retrieval are considered jointly.
arXiv Detail & Related papers (2021-03-11T17:40:59Z) - Geometry-Guided Street-View Panorama Synthesis from Satellite Imagery [80.6282101835164]
We present a new approach for synthesizing a novel street-view panorama given an overhead satellite image.
Our method generates a Google's omnidirectional street-view type panorama, as if it is captured from the same geographical location as the center of the satellite patch.
arXiv Detail & Related papers (2021-03-02T10:27:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.