Related papers: 4DSegStreamer: Streaming 4D Panoptic Segmentation via Dual Threads

4DSegStreamer: Streaming 4D Panoptic Segmentation via Dual Threads

URL: http://arxiv.org/abs/2510.17664v1
Date: Mon, 20 Oct 2025 15:37:49 GMT
Title: 4DSegStreamer: Streaming 4D Panoptic Segmentation via Dual Threads
Authors: Ling Liu, Jun Tian, Li Yi,
Abstract summary: We introduce 4DSegStreamer, a novel framework that employs a Dual-Thread System to efficiently process streaming frames.<n>The framework is general and can be seamlessly integrated into existing 3D and 4D segmentation methods to enable real-time capability.<n>It also demonstrates superior robustness compared to existing streaming perception approaches, particularly under high FPS conditions.
Score: 17.413013509299933
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 4D panoptic segmentation in a streaming setting is critical for highly dynamic environments, such as evacuating dense crowds and autonomous driving in complex scenarios, where real-time, fine-grained perception within a constrained time budget is essential. In this paper, we introduce 4DSegStreamer, a novel framework that employs a Dual-Thread System to efficiently process streaming frames. The framework is general and can be seamlessly integrated into existing 3D and 4D segmentation methods to enable real-time capability. It also demonstrates superior robustness compared to existing streaming perception approaches, particularly under high FPS conditions. The system consists of a predictive thread and an inference thread. The predictive thread leverages historical motion and geometric information to extract features and forecast future dynamics. The inference thread ensures timely prediction for incoming frames by aligning with the latest memory and compensating for ego-motion and dynamic object movements. We evaluate 4DSegStreamer on the indoor HOI4D dataset and the outdoor SemanticKITTI and nuScenes datasets. Comprehensive experiments demonstrate the effectiveness of our approach, particularly in accurately predicting dynamic objects in complex scenes.

Related papers

Tracking-Guided 4D Generation: Foundation-Tracker Motion Priors for 3D Model Animation [21.075786141331974]
We present emphTrack4DGen, a framework for generating dynamic 4D objects from sparse inputs.<n>In Stage One, we enforce dense, feature-level point correspondences inside the diffusion generator.<n>In Stage Two, we reconstruct a dynamic 4D-GS using a hybrid motion encoding.
arXiv Detail & Related papers (2025-12-05T21:13:04Z)
PointNet4D: A Lightweight 4D Point Cloud Video Backbone for Online and Offline Perception in Robotic Applications [17.120778989036012]
We propose PointNet4D, a lightweight 4D backbone optimized for both online and offline settings.<n>At its core is a Hybrid Mamba-Transformer temporal fusion block, which integrates the efficient state-space modeling of Mamba and the bidirectional modeling power of Transformers.<n>To enhance temporal understanding, we introduce 4DMAP, a frame-wise masked auto-regressive pretraining strategy that captures motion cues across frames.
arXiv Detail & Related papers (2025-12-01T07:58:01Z)
Streaming 4D Visual Geometry Transformer [63.99937807085461]
We propose a streaming 4D visual geometry transformer to process the input sequence in an online manner.<n>We use temporal causal attention and cache the historical keys and values as implicit memory to enable efficient streaming long-term 4D reconstruction.<n>Experiments on various 4D geometry perception benchmarks demonstrate that our model increases the inference speed in online scenarios.
arXiv Detail & Related papers (2025-07-15T17:59:57Z)
$I^{2}$-World: Intra-Inter Tokenization for Efficient Dynamic 4D Scene Forecasting [2.722128680610171]
$I2$-World is an efficient framework for 4D occupancy forecasting.<n>Our method decouples scene tokenization into intra-scene and inter-scene tokenizers.<n>$I2$-World achieves state-of-the-art performance, outperforming existing methods by 25.1% in mIoU and 36.9% in IoU for 4D occupancy forecasting.
arXiv Detail & Related papers (2025-07-12T05:14:39Z)
Easi3R: Estimating Disentangled Motion from DUSt3R Without Training [69.51086319339662]
We introduce Easi3R, a simple yet efficient training-free method for 4D reconstruction.<n>Our approach applies attention adaptation during inference, eliminating the need for from-scratch pre-training or network fine-tuning.<n>Our experiments on real-world dynamic videos demonstrate that our lightweight attention adaptation significantly outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2025-03-31T17:59:58Z)
DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation [50.01520547454224]
Current generative models struggle to synthesize 4D driving scenes that simultaneously support temporal extrapolation and spatial novel view synthesis (NVS)<n>We propose DiST-4D, which disentangles the problem into two diffusion processes: DiST-T, which predicts future metric depth and multi-view RGB sequences directly from past observations, and DiST-S, which enables spatial NVS by training only on existing viewpoints while enforcing cycle consistency.<n>Experiments demonstrate that DiST-4D achieves state-of-the-art performance in both temporal prediction and NVS tasks, while also delivering competitive performance in planning-related evaluations.
arXiv Detail & Related papers (2025-03-19T13:49:48Z)
Driv3R: Learning Dense 4D Reconstruction for Autonomous Driving [116.10577967146762]
We propose Driv3R, a framework that directly regresses per-frame point maps from multi-view image sequences.<n>We employ a 4D flow predictor to identify moving objects within the scene to direct our network focus more on reconstructing these dynamic regions.<n>Driv3R outperforms previous frameworks in 4D dynamic scene reconstruction, achieving 15x faster inference speed.
arXiv Detail & Related papers (2024-12-09T18:58:03Z)
4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency [118.15258850780417]
We present textbf4DGen, a novel framework for grounded 4D content creation.<n>Our pipeline facilitates controllable 4D generation, enabling users to specify the motion via monocular video or adopt image-to-video generations.<n>Compared to existing video-to-4D baselines, our approach yields superior results in faithfully reconstructing input signals.
arXiv Detail & Related papers (2023-12-28T18:53:39Z)
NeRFPlayer: A Streamable Dynamic Scene Representation with Decomposed Neural Radiance Fields [99.57774680640581]
We present an efficient framework capable of fast reconstruction, compact modeling, and streamable rendering. We propose to decompose the 4D space according to temporal characteristics. Points in the 4D space are associated with probabilities belonging to three categories: static, deforming, and new areas.
arXiv Detail & Related papers (2022-10-28T07:11:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.