Cylindrical Convolutional Networks for Joint Object Detection and
Viewpoint Estimation
- URL: http://arxiv.org/abs/2003.11303v1
- Date: Wed, 25 Mar 2020 10:24:58 GMT
- Title: Cylindrical Convolutional Networks for Joint Object Detection and
Viewpoint Estimation
- Authors: Sunghun Joung, Seungryong Kim, Hanjae Kim, Minsu Kim, Ig-Jae Kim,
Junghyun Cho, Kwanghoon Sohn
- Abstract summary: We introduce a learnable module, cylindrical convolutional networks (CCNs), that exploit cylindrical representation of a convolutional kernel defined in the 3D space.
CCNs extract a view-specific feature through a view-specific convolutional kernel to predict object category scores at each viewpoint.
Our experiments demonstrate the effectiveness of the cylindrical convolutional networks on joint object detection and viewpoint estimation.
- Score: 76.21696417873311
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing techniques to encode spatial invariance within deep convolutional
neural networks only model 2D transformation fields. This does not account for
the fact that objects in a 2D space are a projection of 3D ones, and thus they
have limited ability to severe object viewpoint changes. To overcome this
limitation, we introduce a learnable module, cylindrical convolutional networks
(CCNs), that exploit cylindrical representation of a convolutional kernel
defined in the 3D space. CCNs extract a view-specific feature through a
view-specific convolutional kernel to predict object category scores at each
viewpoint. With the view-specific feature, we simultaneously determine
objective category and viewpoints using the proposed sinusoidal soft-argmax
module. Our experiments demonstrate the effectiveness of the cylindrical
convolutional networks on joint object detection and viewpoint estimation.
Related papers
- PMPNet: Pixel Movement Prediction Network for Monocular Depth Estimation in Dynamic Scenes [7.736445799116692]
We propose a novel method for monocular depth estimation in dynamic scenes.
We first explore the arbitrariness of object's movement trajectory in dynamic scenes theoretically.
To overcome the depth inconsistency problem around the edges, we propose a deformable support window module.
arXiv Detail & Related papers (2024-11-04T03:42:29Z) - Spherical Frustum Sparse Convolution Network for LiDAR Point Cloud Semantic Segmentation [62.258256483231484]
LiDAR point cloud semantic segmentation enables the robots to obtain fine-grained semantic information of the surrounding environment.
Many works project the point cloud onto the 2D image and adopt the 2D Convolutional Neural Networks (CNNs) or vision transformer for LiDAR point cloud semantic segmentation.
In this paper, we propose a novel spherical frustum structure to avoid quantized information loss.
arXiv Detail & Related papers (2023-11-29T09:55:13Z) - PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic
Occupancy Prediction [72.75478398447396]
We propose a cylindrical tri-perspective view to represent point clouds effectively and comprehensively.
Considering the distance distribution of LiDAR point clouds, we construct the tri-perspective view in the cylindrical coordinate system.
We employ spatial group pooling to maintain structural details during projection and adopt 2D backbones to efficiently process each TPV plane.
arXiv Detail & Related papers (2023-08-31T17:57:17Z) - Parametric Depth Based Feature Representation Learning for Object
Detection and Segmentation in Bird's Eye View [44.78243406441798]
This paper focuses on leveraging geometry information, such as depth, to model such feature transformation.
We first lift the 2D image features to the 3D space defined for the ego vehicle via a predicted parametric depth distribution for each pixel in each view.
We then aggregate the 3D feature volume based on the 3D space occupancy derived from depth to the BEV frame.
arXiv Detail & Related papers (2023-07-09T06:07:22Z) - Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based
Perception [122.53774221136193]
State-of-the-art methods for driving-scene LiDAR-based perception often project the point clouds to 2D space and then process them via 2D convolution.
A natural remedy is to utilize the 3D voxelization and 3D convolution network.
We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pattern.
arXiv Detail & Related papers (2021-09-12T06:25:11Z) - Spatially Invariant Unsupervised 3D Object Segmentation with Graph
Neural Networks [23.729853358582506]
We propose a framework, SPAIR3D, to model a point cloud as a spatial mixture model.
We jointly learn the multiple-object representation and segmentation in 3D via Variational Autoencoders (VAE)
Experimental results demonstrate that SPAIR3D is capable of detecting and segmenting variable number of objects without appearance information.
arXiv Detail & Related papers (2021-06-10T09:20:16Z) - Spherical Transformer: Adapting Spherical Signal to CNNs [53.18482213611481]
Spherical Transformer can transform spherical signals into vectors that can be directly processed by standard CNNs.
We evaluate our approach on the tasks of spherical MNIST recognition, 3D object classification and omnidirectional image semantic segmentation.
arXiv Detail & Related papers (2021-01-11T12:33:16Z) - Exploring Deep 3D Spatial Encodings for Large-Scale 3D Scene
Understanding [19.134536179555102]
We propose an alternative approach to overcome the limitations of CNN based approaches by encoding the spatial features of raw 3D point clouds into undirected graph models.
The proposed method achieves on par state-of-the-art accuracy with improved training time and model stability thus indicating strong potential for further research.
arXiv Detail & Related papers (2020-11-29T12:56:19Z) - Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic
Segmentation [87.54570024320354]
State-of-the-art methods for large-scale driving-scene LiDAR semantic segmentation often project and process the point clouds in the 2D space.
A straightforward solution to tackle the issue of 3D-to-2D projection is to keep the 3D representation and process the points in the 3D space.
We develop a 3D cylinder partition and a 3D cylinder convolution based framework, termed as Cylinder3D, which exploits the 3D topology relations and structures of driving-scene point clouds.
arXiv Detail & Related papers (2020-08-04T13:56:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.