GSNet: Joint Vehicle Pose and Shape Reconstruction with Geometrical and
Scene-aware Supervision
- URL: http://arxiv.org/abs/2007.13124v1
- Date: Sun, 26 Jul 2020 13:05:55 GMT
- Title: GSNet: Joint Vehicle Pose and Shape Reconstruction with Geometrical and
Scene-aware Supervision
- Authors: Lei Ke, Shichao Li, Yanan Sun, Yu-Wing Tai, Chi-Keung Tang
- Abstract summary: We present a novel end-to-end framework named as GSNet (Geometric and Scene-aware Network)
It jointly estimates 6DoF poses and reconstructs detailed 3D car shapes from single urban street view.
We evaluate GSNet on the largest multi-task ApolloCar3D benchmark and achieve state-of-the-art performance both quantitatively and qualitatively.
- Score: 65.13980934546957
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel end-to-end framework named as GSNet (Geometric and
Scene-aware Network), which jointly estimates 6DoF poses and reconstructs
detailed 3D car shapes from single urban street view. GSNet utilizes a unique
four-way feature extraction and fusion scheme and directly regresses 6DoF poses
and shapes in a single forward pass. Extensive experiments show that our
diverse feature extraction and fusion scheme can greatly improve model
performance. Based on a divide-and-conquer 3D shape representation strategy,
GSNet reconstructs 3D vehicle shape with great detail (1352 vertices and 2700
faces). This dense mesh representation further leads us to consider geometrical
consistency and scene context, and inspires a new multi-objective loss function
to regularize network training, which in turn improves the accuracy of 6D pose
estimation and validates the merit of jointly performing both tasks. We
evaluate GSNet on the largest multi-task ApolloCar3D benchmark and achieve
state-of-the-art performance both quantitatively and qualitatively. Project
page is available at https://lkeab.github.io/gsnet/.
Related papers
- DiMeR: Disentangled Mesh Reconstruction Model [24.07380724530745]
We introduce DiMeR, a novel disentangled dual-stream feed-forward model for sparse-view mesh reconstruction.
We demonstrate robust capabilities across various tasks, including sparse-view reconstruction, single-image-to-3D, and text-to-3D.
arXiv Detail & Related papers (2025-04-24T15:39:20Z) - Single-view 3D Mesh Reconstruction for Seen and Unseen Categories [69.29406107513621]
Single-view 3D Mesh Reconstruction is a fundamental computer vision task that aims at recovering 3D shapes from single-view RGB images.
This paper tackles Single-view 3D Mesh Reconstruction, to study the model generalization on unseen categories.
We propose an end-to-end two-stage network, GenMesh, to break the category boundaries in reconstruction.
arXiv Detail & Related papers (2022-08-04T14:13:35Z) - Towards 3D Scene Reconstruction from Locally Scale-Aligned Monocular
Video Depth [90.33296913575818]
In some video-based scenarios such as video depth estimation and 3D scene reconstruction from a video, the unknown scale and shift residing in per-frame prediction may cause the depth inconsistency.
We propose a locally weighted linear regression method to recover the scale and shift with very sparse anchor points.
Our method can boost the performance of existing state-of-the-art approaches by 50% at most over several zero-shot benchmarks.
arXiv Detail & Related papers (2022-02-03T08:52:54Z) - Multi-initialization Optimization Network for Accurate 3D Human Pose and
Shape Estimation [75.44912541912252]
We propose a three-stage framework named Multi-Initialization Optimization Network (MION)
In the first stage, we strategically select different coarse 3D reconstruction candidates which are compatible with the 2D keypoints of input sample.
In the second stage, we design a mesh refinement transformer (MRT) to respectively refine each coarse reconstruction result via a self-attention mechanism.
Finally, a Consistency Estimation Network (CEN) is proposed to find the best result from mutiple candidates by evaluating if the visual evidence in RGB image matches a given 3D reconstruction.
arXiv Detail & Related papers (2021-12-24T02:43:58Z) - Category-Level 6D Object Pose Estimation via Cascaded Relation and
Recurrent Reconstruction Networks [22.627704070200863]
Category-level 6D pose estimation is fundamental to many scenarios such as robotic manipulation and augmented reality.
We achieve accurate category-level 6D pose estimation via cascaded relation and recurrent reconstruction networks.
Our method exceeds the latest state-of-the-art SPD by $4.9%$ and $17.7%$ on the CAMERA25 dataset.
arXiv Detail & Related papers (2021-08-19T15:46:52Z) - Monocular 3D Detection with Geometric Constraints Embedding and
Semi-supervised Training [3.8073142980733]
We propose a novel framework for monocular 3D objects detection using only RGB images, called KM3D-Net.
We design a fully convolutional model to predict object keypoints, dimension, and orientation, and then combine these estimations with perspective geometry constraints to compute position attribute.
arXiv Detail & Related papers (2020-09-02T00:51:51Z) - Shape Prior Deformation for Categorical 6D Object Pose and Size
Estimation [62.618227434286]
We present a novel learning approach to recover the 6D poses and sizes of unseen object instances from an RGB-D image.
We propose a deep network to reconstruct the 3D object model by explicitly modeling the deformation from a pre-learned categorical shape prior.
arXiv Detail & Related papers (2020-07-16T16:45:05Z) - PerMO: Perceiving More at Once from a Single Image for Autonomous
Driving [76.35684439949094]
We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image.
Our approach combines the strengths of deep learning and the elegance of traditional techniques.
We have integrated these algorithms with an autonomous driving system.
arXiv Detail & Related papers (2020-07-16T05:02:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.