SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World
- URL: http://arxiv.org/abs/2503.16399v1
- Date: Thu, 20 Mar 2025 17:54:29 GMT
- Title: SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World
- Authors: Chen Chen, Zhirui Wang, Taowei Sheng, Yi Jiang, Yundu Li, Peirui Cheng, Luning Zhang, Kaiqiang Chen, Yanfeng Hu, Xue Yang, Xian Sun,
- Abstract summary: We propose SA-Occ, the first Satellite-Assisted 3D occupancy prediction model.<n>It integrates historical yet readily available satellite imagery into real-time applications.<n>It achieves state-of-the-art performance, especially among single-frame methods.
- Score: 19.190830406660826
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing vision-based 3D occupancy prediction methods are inherently limited in accuracy due to their exclusive reliance on street-view imagery, neglecting the potential benefits of incorporating satellite views. We propose SA-Occ, the first Satellite-Assisted 3D occupancy prediction model, which leverages GPS & IMU to integrate historical yet readily available satellite imagery into real-time applications, effectively mitigating limitations of ego-vehicle perceptions, involving occlusions and degraded performance in distant regions. To address the core challenges of cross-view perception, we propose: 1) Dynamic-Decoupling Fusion, which resolves inconsistencies in dynamic regions caused by the temporal asynchrony between satellite and street views; 2) 3D-Proj Guidance, a module that enhances 3D feature extraction from inherently 2D satellite imagery; and 3) Uniform Sampling Alignment, which aligns the sampling density between street and satellite views. Evaluated on Occ3D-nuScenes, SA-Occ achieves state-of-the-art performance, especially among single-frame methods, with a 39.05% mIoU (a 6.97% improvement), while incurring only 6.93 ms of additional latency per frame. Our code and newly curated dataset are available at https://github.com/chenchen235/SA-Occ.
Related papers
- EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation [50.433911327489554]
We introduce EarthMapper, a novel framework for controllable satellite-map translation.
We also contribute CNSatMap, a large-scale dataset comprising 302,132 precisely aligned satellite-map pairs across 38 Chinese cities.
experiments on CNSatMap and the New York dataset demonstrate EarthMapper's superior performance.
arXiv Detail & Related papers (2025-04-28T02:41:12Z) - S3MOT: Monocular 3D Object Tracking with Selective State Space Model [3.5047603107971397]
Multi-object tracking in 3D space is essential for advancing robotics and computer applications.
It remains a significant challenge in monocular setups due to the difficulty of mining 3D associations from 2D video streams.
We present three innovative techniques to enhance the fusion of heterogeneous cues for monocular 3D MOT.
arXiv Detail & Related papers (2025-04-25T04:45:35Z) - EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis [61.1662426227688]
Existing NeRF and 3DGS-based methods show promising results in achieving photorealistic renderings but require slow, per-scene optimization.
We introduce EVolSplat, an efficient 3D Gaussian Splatting model for urban scenes that works in a feed-forward manner.
arXiv Detail & Related papers (2025-03-26T02:47:27Z) - SGFormer: Satellite-Ground Fusion for 3D Semantic Scene Completion [38.85690940616852]
This paper presents the first satellite-ground cooperative SSC framework, i.e., SGFormer.
We propose a dual-branch architecture that encodes satellite and ground views in parallel, unifying them into a common domain.
We develop an adaptive weighting strategy that balances contributions from satellite and ground views.
arXiv Detail & Related papers (2025-03-21T03:37:08Z) - Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark [15.405137983083875]
Aerial-ground cooperation offers a promising solution by integrating UAVs' aerial views with ground vehicles' local observations.<n>This paper presents a comprehensive solution for aerial-ground cooperative 3D perception through three key contributions.
arXiv Detail & Related papers (2025-03-10T07:00:07Z) - SatSplatYOLO: 3D Gaussian Splatting-based Virtual Object Detection Ensembles for Satellite Feature Recognition [0.0]
We present an approach for mapping geometries and high-confidence detection of components of unknown, non-cooperative satellites on orbit.
We implement accelerated 3D Gaussian splatting to learn a 3D representation of the satellite, render virtual views of the target, and ensemble the YOLOv5 object detector over the virtual views.
arXiv Detail & Related papers (2024-06-04T17:54:20Z) - Reconstructing Satellites in 3D from Amateur Telescope Images [44.20773507571372]
We propose a novel computational imaging framework that overcomes obstacles by integrating a hybrid image pre-processing pipeline.
We validate our approach on both synthetic satellite datasets and on-sky observations of China's Tiangong Space Station and the International Space Station.
Our framework enables high-fidelity 3D satellite monitoring from Earth, offering a cost-effective alternative for space situational awareness.
arXiv Detail & Related papers (2024-04-29T03:13:09Z) - Advancing Applications of Satellite Photogrammetry: Novel Approaches for Built-up Area Modeling and Natural Environment Monitoring using Stereo/Multi-view Satellite Image-derived 3D Data [0.0]
This dissertation explores several novel approaches based on stereo and multi-view satellite image-derived 3D geospatial data.
It introduces four parts of novel approaches that deal with the spatial and temporal challenges with satellite-derived 3D data.
Overall, this dissertation demonstrates the extensive potential of satellite photogrammetry applications in addressing urban and environmental challenges.
arXiv Detail & Related papers (2024-04-18T20:02:52Z) - Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion [77.34078223594686]
We propose a novel architecture for direct 3D scene generation by introducing diffusion models into 3D sparse representations and combining them with neural rendering techniques.
Specifically, our approach generates texture colors at the point level for a given geometry using a 3D diffusion model first, which is then transformed into a scene representation in a feed-forward manner.
Experiments in two city-scale datasets show that our model demonstrates proficiency in generating photo-realistic street-view image sequences and cross-view urban scenes from satellite imagery.
arXiv Detail & Related papers (2024-01-19T16:15:37Z) - Characterizing Satellite Geometry via Accelerated 3D Gaussian Splatting [0.0]
We present an approach for mapping of satellites on orbit based on 3D Gaussian Splatting.
We demonstrate model training and 3D rendering performance on a hardware-in-the-loop satellite mock-up.
Our model is shown to be capable of training on-board and rendering higher quality novel views of an unknown satellite nearly 2 orders of magnitude faster than previous NeRF-based algorithms.
arXiv Detail & Related papers (2024-01-05T00:49:56Z) - Uncertainty-aware State Space Transformer for Egocentric 3D Hand
Trajectory Forecasting [79.34357055254239]
Hand trajectory forecasting is crucial for enabling a prompt understanding of human intentions when interacting with AR/VR systems.
Existing methods handle this problem in a 2D image space which is inadequate for 3D real-world applications.
We set up an egocentric 3D hand trajectory forecasting task that aims to predict hand trajectories in a 3D space from early observed RGB videos in a first-person view.
arXiv Detail & Related papers (2023-07-17T04:55:02Z) - Semantic Scene Completion with Cleaner Self [93.99441599791275]
Semantic Scene Completion (SSC) transforms an image of single-view depth and/or RGB 2D pixels into 3D voxels, each of whose semantic labels are predicted.
SSC is a well-known ill-posed problem as the prediction model has to "imagine" what is behind the visible surface, which is usually represented by Truncated Signed Distance Function (TSDF)
We use the ground-truth 3D voxels to generate a perfect visible surface, called TSDF-CAD, and then train a "cleaner" SSC model.
As the model is noise-free, it is expected to
arXiv Detail & Related papers (2023-03-17T13:50:18Z) - Satellite Image Based Cross-view Localization for Autonomous Vehicle [59.72040418584396]
This paper shows that by using an off-the-shelf high-definition satellite image as a ready-to-use map, we are able to achieve cross-view vehicle localization up to a satisfactory accuracy.
Our method is validated on KITTI and Ford Multi-AV Seasonal datasets as ground view and Google Maps as the satellite view.
arXiv Detail & Related papers (2022-07-27T13:16:39Z) - Mosaic Zonotope Shadow Matching for Risk-Aware Autonomous Localization
in Harsh Urban Environments [0.966840768820136]
Risk-aware urban localization with the Global Navigation Satellite System (GNSS) remains an unsolved problem.
We propose Mosaic Zonotope Shadow Matching (MZSM) that employs a classifier-agnostic polytope mosaic architecture.
We perform high-fidelity simulations using a 3D building map of San Francisco to validate our algorithm's risk-aware improvements.
arXiv Detail & Related papers (2022-04-30T21:01:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.