Ellipse R-CNN: Learning to Infer Elliptical Object from Clustering and
Occlusion
- URL: http://arxiv.org/abs/2001.11584v2
- Date: Sat, 14 Nov 2020 21:05:09 GMT
- Title: Ellipse R-CNN: Learning to Infer Elliptical Object from Clustering and
Occlusion
- Authors: Wenbo Dong, Pravakar Roy, Cheng Peng, Volkan Isler
- Abstract summary: We introduce the first CNN-based ellipse detector, called Ellipse R-CNN, to represent and infer occluded objects as ellipses.
We first propose a robust and compact ellipse regression based on the Mask R-CNN architecture for elliptical object detection.
- Score: 31.237782332036552
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Images of heavily occluded objects in cluttered scenes, such as fruit
clusters in trees, are hard to segment. To further retrieve the 3D size and 6D
pose of each individual object in such cases, bounding boxes are not reliable
from multiple views since only a little portion of the object's geometry is
captured. We introduce the first CNN-based ellipse detector, called Ellipse
R-CNN, to represent and infer occluded objects as ellipses. We first propose a
robust and compact ellipse regression based on the Mask R-CNN architecture for
elliptical object detection. Our method can infer the parameters of multiple
elliptical objects even they are occluded by other neighboring objects. For
better occlusion handling, we exploit refined feature regions for the
regression stage, and integrate the U-Net structure for learning different
occlusion patterns to compute the final detection score. The correctness of
ellipse regression is validated through experiments performed on synthetic data
of clustered ellipses. We further quantitatively and qualitatively demonstrate
that our approach outperforms the state-of-the-art model (i.e., Mask R-CNN
followed by ellipse fitting) and its three variants on both synthetic and real
datasets of occluded and clustered elliptical objects.
Related papers
- KP-RED: Exploiting Semantic Keypoints for Joint 3D Shape Retrieval and Deformation [87.23575166061413]
KP-RED is a unified KeyPoint-driven REtrieval and Deformation framework.
It takes object scans as input and jointly retrieves and deforms the most geometrically similar CAD models.
arXiv Detail & Related papers (2024-03-15T08:44:56Z) - Level Set-Based Camera Pose Estimation From Multiple 2D/3D
Ellipse-Ellipsoid Correspondences [2.016317500787292]
We show that the definition of a cost function characterizing the projection of a 3D object onto a 2D object detection is not straightforward.
We develop an ellipse-ellipse cost based on level sets sampling, demonstrate its nice properties for handling partially visible objects and compare its performance with other common metrics.
arXiv Detail & Related papers (2022-07-16T14:09:54Z) - Topologically Persistent Features-based Object Recognition in Cluttered
Indoor Environments [1.2691047660244335]
Recognition of occluded objects in unseen indoor environments is a challenging problem for mobile robots.
This work proposes a new slicing-based topological descriptor that captures the 3D shape of object point clouds.
It yields similarities between the descriptors of the occluded and the corresponding unoccluded objects, enabling object unity-based recognition.
arXiv Detail & Related papers (2022-05-16T07:01:16Z) - Approximate Convex Decomposition for 3D Meshes with Collision-Aware
Concavity and Tree Search [23.52274863244624]
Approximate convex decomposition aims to decompose a 3D shape into a set of almost convex components.
It has been widely used in game engines, physics simulations, and animation.
We propose a novel method that addresses the limitations of existing approaches from three perspectives.
arXiv Detail & Related papers (2022-05-05T23:40:15Z) - Sparse Cross-scale Attention Network for Efficient LiDAR Panoptic
Segmentation [12.61753274984776]
We present SCAN, a novel sparse cross-scale attention network to align multi-scale sparse features with global voxel-encoded attention to capture the long-range relationship of instance context.
For the surface-aggregated points, SCAN adopts a novel sparse class-agnostic representation of instance centroids, which can not only maintain the sparsity of aligned features, but also reduce the amount of the network through sparse convolution.
arXiv Detail & Related papers (2022-01-16T05:34:54Z) - Ellipse Regression with Predicted Uncertainties for Accurate Multi-View
3D Object Estimation [26.930403135038475]
This work considers objects whose three-dimensional models can be represented as ellipsoids.
We present a variant of Mask R-CNN for estimating the parameters of ellipsoidal objects by segmenting each object and accurately regressing the parameters of projection ellipses.
arXiv Detail & Related papers (2020-12-27T19:52:58Z) - From Points to Multi-Object 3D Reconstruction [71.17445805257196]
We propose a method to detect and reconstruct multiple 3D objects from a single RGB image.
A keypoint detector localizes objects as center points and directly predicts all object properties, including 9-DoF bounding boxes and 3D shapes.
The presented approach performs lightweight reconstruction in a single-stage, it is real-time capable, fully differentiable and end-to-end trainable.
arXiv Detail & Related papers (2020-12-21T18:52:21Z) - Canonical 3D Deformer Maps: Unifying parametric and non-parametric
methods for dense weakly-supervised category reconstruction [79.98689027127855]
We propose a new representation of the 3D shape of common object categories that can be learned from a collection of 2D images of independent objects.
Our method builds in a novel way on concepts from parametric deformation models, non-parametric 3D reconstruction, and canonical embeddings.
It achieves state-of-the-art results in dense 3D reconstruction on public in-the-wild datasets of faces, cars, and birds.
arXiv Detail & Related papers (2020-08-28T15:44:05Z) - Geometry Constrained Weakly Supervised Object Localization [55.17224813345206]
We propose a geometry constrained network, termed GC-Net, for weakly supervised object localization.
The detector predicts the object location defined by a set of coefficients describing a geometric shape.
The generator takes the resulting masked images as input and performs two complementary classification tasks for the object and background.
In contrast to previous approaches, GC-Net is trained end-to-end and predict object location without any post-processing.
arXiv Detail & Related papers (2020-07-19T17:33:42Z) - Cylindrical Convolutional Networks for Joint Object Detection and
Viewpoint Estimation [76.21696417873311]
We introduce a learnable module, cylindrical convolutional networks (CCNs), that exploit cylindrical representation of a convolutional kernel defined in the 3D space.
CCNs extract a view-specific feature through a view-specific convolutional kernel to predict object category scores at each viewpoint.
Our experiments demonstrate the effectiveness of the cylindrical convolutional networks on joint object detection and viewpoint estimation.
arXiv Detail & Related papers (2020-03-25T10:24:58Z) - Learning Nonparametric Human Mesh Reconstruction from a Single Image
without Ground Truth Meshes [56.27436157101251]
We propose a novel approach to learn human mesh reconstruction without any ground truth meshes.
This is made possible by introducing two new terms into the loss function of a graph convolutional neural network (Graph CNN)
arXiv Detail & Related papers (2020-02-28T20:30:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.