Learning to Predict 3D Lane Shape and Camera Pose from a Single Image
via Geometry Constraints
- URL: http://arxiv.org/abs/2112.15351v1
- Date: Fri, 31 Dec 2021 08:59:27 GMT
- Title: Learning to Predict 3D Lane Shape and Camera Pose from a Single Image
via Geometry Constraints
- Authors: Ruijin Liu, Dapeng Chen, Tie Liu, Zhiliang Xiong, Zejian Yuan
- Abstract summary: We propose to predict 3D lanes by estimating camera pose from a single image with a two-stage framework.
The first stage aims at the camera pose task from perspective-view images.
The second stage targets the 3D lane task. It uses previously estimated pose to generate top-view images containing distance-invariant lane appearances.
- Score: 25.7441545608721
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detecting 3D lanes from the camera is a rising problem for autonomous
vehicles. In this task, the correct camera pose is the key to generating
accurate lanes, which can transform an image from perspective-view to the
top-view. With this transformation, we can get rid of the perspective effects
so that 3D lanes would look similar and can accurately be fitted by low-order
polynomials. However, mainstream 3D lane detectors rely on perfect camera poses
provided by other sensors, which is expensive and encounters multi-sensor
calibration issues. To overcome this problem, we propose to predict 3D lanes by
estimating camera pose from a single image with a two-stage framework. The
first stage aims at the camera pose task from perspective-view images. To
improve pose estimation, we introduce an auxiliary 3D lane task and geometry
constraints to benefit from multi-task learning, which enhances consistencies
between 3D and 2D, as well as compatibility in the above two tasks. The second
stage targets the 3D lane task. It uses previously estimated pose to generate
top-view images containing distance-invariant lane appearances for predicting
accurate 3D lanes. Experiments demonstrate that, without ground truth camera
pose, our method outperforms the state-of-the-art perfect-camera-pose-based
methods and has the fewest parameters and computations. Codes are available at
https://github.com/liuruijin17/CLGo.
Related papers
- MPL: Lifting 3D Human Pose from Multi-view 2D Poses [75.26416079541723]
We propose combining 2D pose estimation, for which large and rich training datasets exist, and 2D-to-3D pose lifting, using a transformer-based network.
Our experiments demonstrate decreases up to 45% in MPJPE errors compared to the 3D pose obtained by triangulating the 2D poses.
arXiv Detail & Related papers (2024-08-20T12:55:14Z) - EPOCH: Jointly Estimating the 3D Pose of Cameras and Humans [5.047302480095444]
Monocular Human Pose Estimation aims at determining the 3D positions of human joints from a single 2D image captured by a camera.
In this study, instead of relying on approximations, we advocate for utilizing the full perspective camera model.
We introduce the EPOCH framework, comprising two main components: the pose lifter network (LiftNet) and the pose regressor network (RegNet)
arXiv Detail & Related papers (2024-06-28T08:16:54Z) - Monocular 3D Object Detection with Depth from Motion [74.29588921594853]
We take advantage of camera ego-motion for accurate object depth estimation and detection.
Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon.
Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
arXiv Detail & Related papers (2022-07-26T15:48:46Z) - VirtualPose: Learning Generalizable 3D Human Pose Models from Virtual
Data [69.64723752430244]
We introduce VirtualPose, a two-stage learning framework to exploit the hidden "free lunch" specific to this task.
The first stage transforms images to abstract geometry representations (AGR), and then the second maps them to 3D poses.
It addresses the generalization issue from two aspects: (1) the first stage can be trained on diverse 2D datasets to reduce the risk of over-fitting to limited appearance; (2) the second stage can be trained on diverse AGR synthesized from a large number of virtual cameras and poses.
arXiv Detail & Related papers (2022-07-20T14:47:28Z) - PersFormer: 3D Lane Detection via Perspective Transformer and the
OpenLane Benchmark [109.03773439461615]
PersFormer is an end-to-end monocular 3D lane detector with a novel Transformer-based spatial feature transformation module.
We release one of the first large-scale real-world 3D lane datasets, called OpenLane, with high-quality annotation and scenario diversity.
arXiv Detail & Related papers (2022-03-21T16:12:53Z) - MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision [72.5863451123577]
We show how to train a neural model that can perform accurate 3D pose and camera estimation.
Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines.
arXiv Detail & Related papers (2021-08-10T18:39:56Z) - CanonPose: Self-Supervised Monocular 3D Human Pose Estimation in the
Wild [31.334715988245748]
We propose a self-supervised approach that learns a single image 3D pose estimator from unlabeled multi-view data.
In contrast to most existing methods, we do not require calibrated cameras and can therefore learn from moving cameras.
Key to the success are new, unbiased reconstruction objectives that mix information across views and training samples.
arXiv Detail & Related papers (2020-11-30T10:42:27Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z) - Learning Precise 3D Manipulation from Multiple Uncalibrated Cameras [13.24490469380487]
We present an effective multi-view approach to end-to-end learning of precise manipulation tasks that are 3D in nature.
Our method learns to accomplish these tasks using multiple statically placed but uncalibrated RGB camera views without building an explicit 3D representation such as a pointcloud or voxel grid.
arXiv Detail & Related papers (2020-02-21T03:28:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.