Related papers: Depth-Adapted CNN for RGB-D cameras

Depth-Adapted CNN for RGB-D cameras

URL: http://arxiv.org/abs/2009.09976v2
Date: Wed, 23 Sep 2020 09:45:21 GMT
Title: Depth-Adapted CNN for RGB-D cameras
Authors: Zongwei Wu, Guillaume Allibert, Christophe Stolz, Cedric Demonceaux
Abstract summary: Conventional 2D Convolutional Neural Networks (CNN) extract features from an input image by applying linear filters. We tackle the problem of improving the classical RGB CNN methods by using the depth information provided by the RGB-D cameras. This paper proposes a novel and generic procedure to articulate both photometric and geometric information in CNN architecture.
Score: 0.3727773051465455
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Conventional 2D Convolutional Neural Networks (CNN) extract features from an input image by applying linear filters. These filters compute the spatial coherence by weighting the photometric information on a fixed neighborhood without taking into account the geometric information. We tackle the problem of improving the classical RGB CNN methods by using the depth information provided by the RGB-D cameras. State-of-the-art approaches use depth as an additional channel or image (HHA) or pass from 2D CNN to 3D CNN. This paper proposes a novel and generic procedure to articulate both photometric and geometric information in CNN architecture. The depth data is represented as a 2D offset to adapt spatial sampling locations. The new model presented is invariant to scale and rotation around the X and the Y axis of the camera coordinate system. Moreover, when depth data is constant, our model is equivalent to a regular CNN. Experiments of benchmarks validate the effectiveness of our model.

Related papers

GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs [49.55919802779889]
We propose a Graph Convolution based Spatial Propagation Network (GraphCSPN) as a general approach for depth completion. In this work, we leverage convolution neural networks as well as graph neural networks in a complementary way for geometric representation learning. Our method achieves the state-of-the-art performance, especially when compared in the case of using only a few propagation steps.
arXiv Detail & Related papers (2022-10-19T17:56:03Z)
Detecting Humans in RGB-D Data with CNNs [14.283154024458739]
We propose a novel fusion approach based on the characteristics of depth images. We also present a new depth-encoding scheme, which not only encodes depth images into three channels but also enhances the information for classification.
arXiv Detail & Related papers (2022-07-17T03:17:09Z)
Depth-Adapted CNNs for RGB-D Semantic Segmentation [2.341385717236931]
We propose a novel framework to incorporate the depth information in the RGB convolutional neural network (CNN) Specifically, our Z-ACN generates a 2D depth-adapted offset which is fully constrained by low-level features to guide the feature extraction on RGB images. With the generated offset, we introduce two intuitive and effective operations to replace basic CNN operators.
arXiv Detail & Related papers (2022-06-08T14:59:40Z)
Uni6D: A Unified CNN Framework without Projection Breakdown for 6D Pose Estimation [21.424035166174352]
State-of-the-art approaches typically use different backbones to extract features for RGB and depth images. We find that the essential reason for using two independent backbones is the "projection breakdown" problem. We propose a simple yet effective method denoted as Uni6D that explicitly takes the extra UV data along with RGB-D images as input.
arXiv Detail & Related papers (2022-03-28T07:05:27Z)
GCNDepth: Self-supervised Monocular Depth Estimation based on Graph Convolutional Network [11.332580333969302]
This work brings a new solution with a set of improvements, which increase the quantitative and qualitative understanding of depth maps. A graph convolutional network (GCN) can handle the convolution on non-Euclidean data and it can be applied to irregular image regions within a topological structure. Our method provided comparable and promising results with a high prediction accuracy of 89% on the publicly KITTI and Make3D datasets.
arXiv Detail & Related papers (2021-12-13T16:46:25Z)
Keypoint Message Passing for Video-based Person Re-Identification [106.41022426556776]
Video-based person re-identification (re-ID) is an important technique in visual surveillance systems which aims to match video snippets of people captured by different cameras. Existing methods are mostly based on convolutional neural networks (CNNs), whose building blocks either process local neighbor pixels at a time, or, when 3D convolutions are used to model temporal information, suffer from the misalignment problem caused by person movement. In this paper, we propose to overcome the limitations of normal convolutions with a human-oriented graph method. Specifically, features located at person joint keypoints are extracted and connected as a spatial-temporal graph
arXiv Detail & Related papers (2021-11-16T08:01:16Z)
OSLO: On-the-Sphere Learning for Omnidirectional images and its application to 360-degree image compression [59.58879331876508]
We study the learning of representation models for omnidirectional images and propose to use the properties of HEALPix uniform sampling of the sphere to redefine the mathematical tools used in deep learning models for omnidirectional images. Our proposed on-the-sphere solution leads to a better compression gain that can save 13.7% of the bit rate compared to similar learned models applied to equirectangular images.
arXiv Detail & Related papers (2021-07-19T22:14:30Z)
DeepI2P: Image-to-Point Cloud Registration via Deep Classification [71.3121124994105]
DeepI2P is a novel approach for cross-modality registration between an image and a point cloud. Our method estimates the relative rigid transformation between the coordinate frames of the camera and Lidar. We circumvent the difficulty by converting the registration problem into a classification and inverse camera projection optimization problem.
arXiv Detail & Related papers (2021-04-08T04:27:32Z)
PCLs: Geometry-aware Neural Reconstruction of 3D Pose with Perspective Crop Layers [111.55817466296402]
We introduce Perspective Crop Layers (PCLs) - a form of perspective crop of the region of interest based on the camera geometry. PCLs deterministically remove the location-dependent perspective effects while leaving end-to-end training and the number of parameters of the underlying neural network. PCL offers an easy way to improve the accuracy of existing 3D reconstruction networks by making them geometry aware.
arXiv Detail & Related papers (2020-11-27T08:48:43Z)
How semantic and geometric information mutually reinforce each other in ToF object localization [19.47618043504105]
We propose a novel approach to localize a 3D object from the intensity and depth information images provided by a Time-of-Flight (ToF) sensor. Our proposed two-step approach improves segmentation and localization accuracy by a significant margin compared to a conventional CNN architecture.
arXiv Detail & Related papers (2020-08-27T09:13:26Z)
Local Grid Rendering Networks for 3D Object Detection in Point Clouds [98.02655863113154]
CNNs are powerful but it would be computationally costly to directly apply convolutions on point data after voxelizing the entire point clouds to a dense regular 3D grid. We propose a novel and principled Local Grid Rendering (LGR) operation to render the small neighborhood of a subset of input points into a low-resolution 3D grid independently. We validate LGR-Net for 3D object detection on the challenging ScanNet and SUN RGB-D datasets.
arXiv Detail & Related papers (2020-07-04T13:57:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.