PCLs: Geometry-aware Neural Reconstruction of 3D Pose with Perspective
Crop Layers
- URL: http://arxiv.org/abs/2011.13607v2
- Date: Thu, 15 Apr 2021 17:39:07 GMT
- Title: PCLs: Geometry-aware Neural Reconstruction of 3D Pose with Perspective
Crop Layers
- Authors: Frank Yu, Mathieu Salzmann, Pascal Fua, Helge Rhodin
- Abstract summary: We introduce Perspective Crop Layers (PCLs) - a form of perspective crop of the region of interest based on the camera geometry.
PCLs deterministically remove the location-dependent perspective effects while leaving end-to-end training and the number of parameters of the underlying neural network.
PCL offers an easy way to improve the accuracy of existing 3D reconstruction networks by making them geometry aware.
- Score: 111.55817466296402
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Local processing is an essential feature of CNNs and other neural network
architectures - it is one of the reasons why they work so well on images where
relevant information is, to a large extent, local. However, perspective effects
stemming from the projection in a conventional camera vary for different global
positions in the image. We introduce Perspective Crop Layers (PCLs) - a form of
perspective crop of the region of interest based on the camera geometry - and
show that accounting for the perspective consistently improves the accuracy of
state-of-the-art 3D pose reconstruction methods. PCLs are modular neural
network layers, which, when inserted into existing CNN and MLP architectures,
deterministically remove the location-dependent perspective effects while
leaving end-to-end training and the number of parameters of the underlying
neural network unchanged. We demonstrate that PCL leads to improved 3D human
pose reconstruction accuracy for CNN architectures that use cropping
operations, such as spatial transformer networks (STN), and, somewhat
surprisingly, MLPs used for 2D-to-3D keypoint lifting. Our conclusion is that
it is important to utilize camera calibration information when available, for
classical and deep-learning-based computer vision alike. PCL offers an easy way
to improve the accuracy of existing 3D reconstruction networks by making them
geometry aware. Our code is publicly available at
github.com/yu-frank/PerspectiveCropLayers.
Related papers
- Large Spatial Model: End-to-end Unposed Images to Semantic 3D [79.94479633598102]
Large Spatial Model (LSM) processes unposed RGB images directly into semantic radiance fields.
LSM simultaneously estimates geometry, appearance, and semantics in a single feed-forward operation.
It can generate versatile label maps by interacting with language at novel viewpoints.
arXiv Detail & Related papers (2024-10-24T17:54:42Z) - Self-supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration [107.61458720202984]
This paper introduces a novel self-supervised learning framework for enhancing 3D perception in autonomous driving scenes.
We propose the learnable transformation alignment to bridge the domain gap between image and point cloud data.
We establish dense 2D-3D correspondences to estimate the rigid pose.
arXiv Detail & Related papers (2024-01-23T02:41:06Z) - SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and
Quasi-Planar Segmentation [53.83313235792596]
We present a new methodology for real-time semantic mapping from RGB-D sequences.
It combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping.
Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems.
arXiv Detail & Related papers (2023-06-28T22:36:44Z) - SGL: Structure Guidance Learning for Camera Localization [7.094881396940598]
We propose a network architecture named Structure Guidance Bundle (SGL) which utilizes the receptive branch and the structure branch to extract both high-level and low-level features.
In this work, we focus on the scene prediction ones and propose a network architecture named SGL which utilizes the receptive branch and the structure branch to extract both high-level and low-level features.
arXiv Detail & Related papers (2023-04-12T02:20:29Z) - Decomposing 3D Neuroimaging into 2+1D Processing for Schizophrenia
Recognition [25.80846093248797]
We propose to process the 3D data by a 2+1D framework so that we can exploit the powerful deep 2D Convolutional Neural Network (CNN) networks pre-trained on the huge ImageNet dataset for 3D neuroimaging recognition.
Specifically, 3D volumes of Magnetic Resonance Imaging (MRI) metrics are decomposed to 2D slices according to neighboring voxel positions.
Global pooling is applied to remove redundant information as the activation patterns are sparsely distributed over feature maps.
Channel-wise and slice-wise convolutions are proposed to aggregate the contextual information in the third dimension unprocessed by the 2D CNN model.
arXiv Detail & Related papers (2022-11-21T15:22:59Z) - PIG-Net: Inception based Deep Learning Architecture for 3D Point Cloud
Segmentation [0.9137554315375922]
We propose a inception based deep network architecture called PIG-Net, that effectively characterizes the local and global geometric details of the point clouds.
We perform an exhaustive experimental analysis of the PIG-Net architecture on two state-of-the-art datasets.
arXiv Detail & Related papers (2021-01-28T13:27:55Z) - Towards Dense People Detection with Deep Learning and Depth images [9.376814409561726]
This paper proposes a DNN-based system that detects multiple people from a single depth image.
Our neural network processes a depth image and outputs a likelihood map in image coordinates.
We show this strategy to be effective, producing networks that generalize to work with scenes different from those used during training.
arXiv Detail & Related papers (2020-07-14T16:43:02Z) - Learning Local Neighboring Structure for Robust 3D Shape Representation [143.15904669246697]
Representation learning for 3D meshes is important in many computer vision and graphics applications.
We propose a local structure-aware anisotropic convolutional operation (LSA-Conv)
Our model produces significant improvement in 3D shape reconstruction compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-04-21T13:40:03Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.