Uncertainty-aware State Space Transformer for Egocentric 3D Hand
Trajectory Forecasting
- URL: http://arxiv.org/abs/2307.08243v2
- Date: Sun, 17 Sep 2023 02:40:01 GMT
- Title: Uncertainty-aware State Space Transformer for Egocentric 3D Hand
Trajectory Forecasting
- Authors: Wentao Bao, Lele Chen, Libing Zeng, Zhong Li, Yi Xu, Junsong Yuan, Yu
Kong
- Abstract summary: Hand trajectory forecasting is crucial for enabling a prompt understanding of human intentions when interacting with AR/VR systems.
Existing methods handle this problem in a 2D image space which is inadequate for 3D real-world applications.
We set up an egocentric 3D hand trajectory forecasting task that aims to predict hand trajectories in a 3D space from early observed RGB videos in a first-person view.
- Score: 79.34357055254239
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hand trajectory forecasting from egocentric views is crucial for enabling a
prompt understanding of human intentions when interacting with AR/VR systems.
However, existing methods handle this problem in a 2D image space which is
inadequate for 3D real-world applications. In this paper, we set up an
egocentric 3D hand trajectory forecasting task that aims to predict hand
trajectories in a 3D space from early observed RGB videos in a first-person
view. To fulfill this goal, we propose an uncertainty-aware state space
Transformer (USST) that takes the merits of the attention mechanism and
aleatoric uncertainty within the framework of the classical state-space model.
The model can be further enhanced by the velocity constraint and visual prompt
tuning (VPT) on large vision transformers. Moreover, we develop an annotation
workflow to collect 3D hand trajectories with high quality. Experimental
results on H2O and EgoPAT3D datasets demonstrate the superiority of USST for
both 2D and 3D trajectory forecasting. The code and datasets are publicly
released: https://actionlab-cv.github.io/EgoHandTrajPred.
Related papers
- Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving [22.832008530490167]
We propose a semi-supervised vision-centric 3D occupancy world model, PreWorld, to leverage the potential of 2D labels.
PreWorld achieves competitive performance across 3D occupancy prediction, 4D occupancy forecasting and motion planning tasks.
arXiv Detail & Related papers (2025-02-11T07:12:26Z) - GaussRender: Learning 3D Occupancy with Gaussian Rendering [84.60008381280286]
GaussRender is a plug-and-play 3D-to-2D reprojection loss that enhances voxel-based supervision.
Our method projects 3D voxel representations into arbitrary 2D perspectives and leverages Gaussian splatting as an efficient, differentiable rendering proxy of voxels.
arXiv Detail & Related papers (2025-02-07T16:07:51Z) - GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction [67.81475355852997]
3D occupancy prediction is important for autonomous driving due to its comprehensive perception of the surroundings.
We propose a world-model-based framework to exploit the scene evolution for perception.
Our framework improves the performance of the single-frame counterpart by over 2% in mIoU without introducing additional computations.
arXiv Detail & Related papers (2024-12-13T18:59:54Z) - 3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation [83.98251722144195]
Previous methods on controllable video generation primarily leverage 2D control signals to manipulate object motions.
We introduce 3DTrajMaster, a robust controller that regulates multi-entity dynamics in 3D space.
We show that 3DTrajMaster sets a new state-of-the-art in both accuracy and generalization for controlling multi-entity 3D motions.
arXiv Detail & Related papers (2024-12-10T18:55:13Z) - Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation [30.744137117668643]
Lift3D is a framework that enhances 2D foundation models with implicit and explicit 3D robotic representations to construct a robust 3D manipulation policy.
In experiments, Lift3D consistently outperforms previous state-of-the-art methods across several simulation benchmarks and real-world scenarios.
arXiv Detail & Related papers (2024-11-27T18:59:52Z) - Robust 3D Semantic Occupancy Prediction with Calibration-free Spatial Transformation [32.50849425431012]
For autonomous cars equipped with multi-camera and LiDAR, it is critical to aggregate multi-sensor information into a unified 3D space for accurate and robust predictions.
Recent methods are mainly built on the 2D-to-3D transformation that relies on sensor calibration to project the 2D image information into the 3D space.
In this work, we propose a calibration-free spatial transformation based on vanilla attention to implicitly model the spatial correspondence.
arXiv Detail & Related papers (2024-11-19T02:40:42Z) - WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild [53.288327629960364]
We present a data-driven pipeline for efficient multi-hand reconstruction in the wild.
The proposed pipeline is composed of two components: a real-time fully convolutional hand localization and a high-fidelity transformer-based 3D hand reconstruction model.
Our approach outperforms previous methods in both efficiency and accuracy on popular 2D and 3D benchmarks.
arXiv Detail & Related papers (2024-09-18T18:46:51Z) - A Spatiotemporal Approach to Tri-Perspective Representation for 3D Semantic Occupancy Prediction [6.527178779672975]
Vision-based 3D semantic occupancy prediction is increasingly overlooked in favor of LiDAR-based approaches.
This study introduces S2TPVFormer, a transformer architecture designed to predict temporally coherent 3D semantic occupancy.
arXiv Detail & Related papers (2024-01-24T20:06:59Z) - Social-Transmotion: Promptable Human Trajectory Prediction [65.80068316170613]
Social-Transmotion is a generic Transformer-based model that exploits diverse and numerous visual cues to predict human behavior.
Our approach is validated on multiple datasets, including JTA, JRDB, Pedestrians and Cyclists in Road Traffic, and ETH-UCY.
arXiv Detail & Related papers (2023-12-26T18:56:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.