Related papers: ST-GS: Vision-Based 3D Semantic Occupancy Prediction with Spatial-Temporal Gaussian Splatting

ST-GS: Vision-Based 3D Semantic Occupancy Prediction with Spatial-Temporal Gaussian Splatting

URL: http://arxiv.org/abs/2509.16552v1
Date: Sat, 20 Sep 2025 06:36:30 GMT
Title: ST-GS: Vision-Based 3D Semantic Occupancy Prediction with Spatial-Temporal Gaussian Splatting
Authors: Xiaoyang Yan, Muleilan Pei, Shaojie Shen,
Abstract summary: 3D occupancy prediction is critical for comprehensive scene understanding in vision-centric autonomous driving.<n>Recent advances have explored utilizing 3D semantic Gaussians to model occupancy while reducing computational overhead.<n>We propose a novel Spatial-Temporal Gaussian Splatting (ST-GS) framework to enhance both spatial and temporal modeling.
Score: 21.87807066521776
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 3D occupancy prediction is critical for comprehensive scene understanding in vision-centric autonomous driving. Recent advances have explored utilizing 3D semantic Gaussians to model occupancy while reducing computational overhead, but they remain constrained by insufficient multi-view spatial interaction and limited multi-frame temporal consistency. To overcome these issues, in this paper, we propose a novel Spatial-Temporal Gaussian Splatting (ST-GS) framework to enhance both spatial and temporal modeling in existing Gaussian-based pipelines. Specifically, we develop a guidance-informed spatial aggregation strategy within a dual-mode attention mechanism to strengthen spatial interaction in Gaussian representations. Furthermore, we introduce a geometry-aware temporal fusion scheme that effectively leverages historical context to improve temporal continuity in scene completion. Extensive experiments on the large-scale nuScenes occupancy prediction benchmark showcase that our proposed approach not only achieves state-of-the-art performance but also delivers markedly better temporal consistency compared to existing Gaussian-based methods.

Related papers

Learning Multi-Modal Mobility Dynamics for Generalized Next Location Recommendation [51.00494428978262]
We leverage multi-modal spatial-temporal knowledge to characterize mobility dynamics for the location recommendation task.<n>First, we construct a unified spatial-temporal relational graph (STRG) for multi-modal representation.<n>Second, we design a gating mechanism to fuse spatial-temporal graph representations of different modalities.
arXiv Detail & Related papers (2025-12-27T14:23:04Z)
RainDiff: End-to-end Precipitation Nowcasting Via Token-wise Attention Diffusion [64.49056527678606]
We propose a Token-wise Attention integrated into not only the U-Net diffusion model but also the radar-temporal encoder.<n>Unlike prior approaches, our method integrates attention into the architecture without incurring the high resource cost typical of pixel-space diffusion.<n>Our experiments and evaluations demonstrate that the proposed method significantly outperforms state-of-the-art approaches, robustness local fidelity, generalization, and superior in complex precipitation forecasting scenarios.
arXiv Detail & Related papers (2025-10-16T17:59:13Z)
A Retrieval Augmented Spatio-Temporal Framework for Traffic Prediction [33.28893562327803]
RAST achieves superior performance while maintaining efficiency in large-scale datasets.<n>Our framework consists of three key designs: 1) Decoupled and Query Retriever to capture decoupled temporal features and construct residual fusion via Retrieval-Augmented Generation (RAG); 2) Universal Backbone Predict Storeor that accommodates pre-trained ST-GNNs or simple predictors; and 3) Universal Backbone Predict Storeor that accommodates pre-trained ST-GNNs or simple predictors.
arXiv Detail & Related papers (2025-08-14T10:11:39Z)
Transformer with Koopman-Enhanced Graph Convolutional Network for Spatiotemporal Dynamics Forecasting [12.301897782320967]
TK-GCN is a two-stage framework that integrates geometry-aware spatial encoding with long-range temporal modeling.<n>We show that TK-GCN consistently delivers superior predictive accuracy across a range of forecast horizons.
arXiv Detail & Related papers (2025-07-05T01:26:03Z)
STDR: Spatio-Temporal Decoupling for Real-Time Dynamic Scene Rendering [15.873329633980015]
Existing 3DGS-based methods for dynamic reconstruction often suffer from textbfSTDR (Spatio-coupling DeTemporal for Real-time rendering)<n>We propose textbfSTDR (Spatio-coupling DeTemporal for Real-time rendering), a plug-and-play module learns thattemporal probability distributions for each scene.
arXiv Detail & Related papers (2025-05-28T14:26:41Z)
Geometry-aware Active Learning of Spatiotemporal Dynamic Systems [4.251030047034566]
This paper proposes a geometry-aware active learning framework for modeling dynamic systems.<n>We develop an adaptive active learning strategy to strategically identify spatial locations for data collection and further maximize the prediction accuracy.
arXiv Detail & Related papers (2025-04-26T19:56:38Z)
Rethinking Temporal Fusion with a Unified Gradient Descent View for 3D Semantic Occupancy Prediction [62.69089767730514]
We present GDFusion, a temporal fusion method for vision-based 3D semantic occupancy prediction (VisionOcc)<n>It opens up the underexplored aspects of temporal fusion within the VisionOcc framework, focusing on both temporal cues and fusion strategies.
arXiv Detail & Related papers (2025-04-17T14:05:33Z)
Sequential Gaussian Avatars with Hierarchical Motion Context [7.6736633105043515]
SMPL-driven 3DGS human avatars struggle to capture fine appearance details due to complex mapping from pose to appearance during fitting.<n>We propose SeqAvatar, which excavates the explicit 3DGS representation to better model human avatars based on a hierarchical motion context.<n>Our method significantly outperforms 3DGS-based approaches and renders human avatars rendering orders of magnitude faster than the latest NeRF-based models.
arXiv Detail & Related papers (2024-11-25T04:05:19Z)
DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes [71.61083731844282]
We present DeSiRe-GS, a self-supervised gaussian splatting representation.<n>It enables effective static-dynamic decomposition and high-fidelity surface reconstruction in complex driving scenarios.
arXiv Detail & Related papers (2024-11-18T05:49:16Z)
STGFormer: Spatio-Temporal GraphFormer for 3D Human Pose Estimation in Video [7.345621536750547]
This paper presents the S-Temporal GraphFormer framework (STGFormer) for 3D human pose estimation in videos.<n>First, we introduce a STG attention mechanism, designed to more effectively leverage the inherent graph distributions of human body.<n>Next, we present a Modulated Hop-wise Regular GCN to independently process temporal and spatial dimensions in parallel.<n>Finally, we demonstrate our method state-of-the-art performance on the Human3.6M and MPIINF-3DHP datasets.
arXiv Detail & Related papers (2024-07-14T06:45:27Z)
Triplet Attention Transformer for Spatiotemporal Predictive Learning [9.059462850026216]
We propose an innovative triplet attention transformer designed to capture both inter-frame dynamics and intra-frame static features. The model incorporates the Triplet Attention Module (TAM), which replaces traditional recurrent units by exploring self-attention mechanisms in temporal, spatial, and channel dimensions.
arXiv Detail & Related papers (2023-10-28T12:49:33Z)
Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition [79.33539539956186]
We propose a simple method to disentangle multi-scale graph convolutions and a unified spatial-temporal graph convolutional operator named G3D. By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model outperforms previous state-of-the-art methods on three large-scale datasets.
arXiv Detail & Related papers (2020-03-31T11:28:25Z)
A Spatial-Temporal Attentive Network with Spatial Continuity for Trajectory Prediction [74.00750936752418]
We propose a novel model named spatial-temporal attentive network with spatial continuity (STAN-SC) First, spatial-temporal attention mechanism is presented to explore the most useful and important information. Second, we conduct a joint feature sequence based on the sequence and instant state information to make the generative trajectories keep spatial continuity.
arXiv Detail & Related papers (2020-03-13T04:35:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.