OcclusionFusion: Occlusion-aware Motion Estimation for Real-time Dynamic
3D Reconstruction
- URL: http://arxiv.org/abs/2203.07977v1
- Date: Tue, 15 Mar 2022 15:09:01 GMT
- Title: OcclusionFusion: Occlusion-aware Motion Estimation for Real-time Dynamic
3D Reconstruction
- Authors: Wenbin Lin, Chengwei Zheng, Jun-Hai Yong, Feng Xu
- Abstract summary: RGBD-based real-time dynamic 3D reconstruction suffers from inaccurate inter-frame motion estimation.
We propose OcclusionFusion, a novel method to calculate occlusion-aware 3D motion to guide the reconstruction.
Our technique outperforms existing single-view-based real-time methods by a large margin.
- Score: 14.130915525776055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: RGBD-based real-time dynamic 3D reconstruction suffers from inaccurate
inter-frame motion estimation as errors may accumulate with online tracking.
This problem is even more severe for single-view-based systems due to strong
occlusions. Based on these observations, we propose OcclusionFusion, a novel
method to calculate occlusion-aware 3D motion to guide the reconstruction. In
our technique, the motion of visible regions is first estimated and combined
with temporal information to infer the motion of the occluded regions through
an LSTM-involved graph neural network. Furthermore, our method computes the
confidence of the estimated motion by modeling the network output with a
probabilistic model, which alleviates untrustworthy motions and enables robust
tracking. Experimental results on public datasets and our own recorded data
show that our technique outperforms existing single-view-based real-time
methods by a large margin. With the reduction of the motion errors, the
proposed technique can handle long and challenging motion sequences. Please
check out the project page for sequence results:
https://wenbin-lin.github.io/OcclusionFusion.
Related papers
- Occlusion-Aware 3D Motion Interpretation for Abnormal Behavior Detection [10.782354892545651]
We present OAD2D, which discriminates against motion abnormalities based on reconstructing 3D coordinates of mesh vertices and human joints from monocular videos.
We reformulate the abnormal posture estimation by coupling it with Motion to Text (M2T) model in which, the VQVAE is employed to quantize motion features.
Our approach demonstrates the robustness of abnormal behavior detection against severe and self-occlusions, as it reconstructs human motion trajectories in global coordinates.
arXiv Detail & Related papers (2024-07-23T18:41:16Z) - RoHM: Robust Human Motion Reconstruction via Diffusion [58.63706638272891]
RoHM is an approach for robust 3D human motion reconstruction from monocular RGB(-D) videos.
It conditioned on noisy and occluded input data, reconstructs complete, plausible motions in consistent global coordinates.
Our method outperforms state-of-the-art approaches qualitatively and quantitatively, while being faster at test time.
arXiv Detail & Related papers (2024-01-16T18:57:50Z) - NIKI: Neural Inverse Kinematics with Invertible Neural Networks for 3D
Human Pose and Shape Estimation [53.25973084799954]
We present NIKI (Neural Inverse Kinematics with Invertible Neural Network), which models bi-directional errors.
NIKI can learn from both the forward and inverse processes with invertible networks.
arXiv Detail & Related papers (2023-05-15T12:13:24Z) - Occlusion Robust 3D Human Pose Estimation with StridedPoseGraphFormer
and Data Augmentation [69.49430149980789]
We show that our proposed method compares favorably with the state-of-the-art (SoA)
Our experimental results also reveal that in the absence of any occlusion handling mechanism, the performance of SoA 3D HPE methods degrades significantly when they encounter occlusion.
arXiv Detail & Related papers (2023-04-24T13:05:13Z) - Scene Synthesis via Uncertainty-Driven Attribute Synchronization [52.31834816911887]
This paper introduces a novel neural scene synthesis approach that can capture diverse feature patterns of 3D scenes.
Our method combines the strength of both neural network-based and conventional scene synthesis approaches.
arXiv Detail & Related papers (2021-08-30T19:45:07Z) - SCFusion: Real-time Incremental Scene Reconstruction with Semantic
Completion [86.77318031029404]
We propose a framework that performs scene reconstruction and semantic scene completion jointly in an incremental and real-time manner.
Our framework relies on a novel neural architecture designed to process occupancy maps and leverages voxel states to accurately and efficiently fuse semantic completion with the 3D global model.
arXiv Detail & Related papers (2020-10-26T15:31:52Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - 3D Pose Detection in Videos: Focusing on Occlusion [0.4588028371034406]
We build upon existing methods for occlusion-aware 3D pose detection in videos.
We implement a two stage architecture that consists of the stacked hourglass network to produce 2D pose predictions.
To facilitate prediction on poses with occluded joints, we introduce an intuitive generalization of the cylinder man model.
arXiv Detail & Related papers (2020-06-24T07:01:17Z) - Joint Spatial-Temporal Optimization for Stereo 3D Object Tracking [34.40019455462043]
We propose a joint spatial-temporal optimization-based stereo 3D object tracking method.
From the network, we detect corresponding 2D bounding boxes on adjacent images and regress an initial 3D bounding box.
Dense object cues that associating to the object centroid are then predicted using a region-based network.
arXiv Detail & Related papers (2020-04-20T13:59:46Z) - A Graph Attention Spatio-temporal Convolutional Network for 3D Human
Pose Estimation in Video [7.647599484103065]
We improve the learning of constraints in human skeleton by modeling local global spatial information via attention mechanisms.
Our approach effectively mitigates depth ambiguity and self-occlusion, generalizes to half upper body estimation, and achieves competitive performance on 2D-to-3D video pose estimation.
arXiv Detail & Related papers (2020-03-11T14:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.