CASAPose: Class-Adaptive and Semantic-Aware Multi-Object Pose Estimation
- URL: http://arxiv.org/abs/2210.05318v1
- Date: Tue, 11 Oct 2022 10:20:01 GMT
- Title: CASAPose: Class-Adaptive and Semantic-Aware Multi-Object Pose Estimation
- Authors: Niklas Gard, Anna Hilsmann, Peter Eisert
- Abstract summary: We present a new single-stage architecture called CASAPose.
It determines 2D-3D correspondences for pose estimation of multiple different objects in RGB images in one pass.
It is fast and memory efficient, and achieves high accuracy for multiple objects.
- Score: 2.861848675707602
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Applications in the field of augmented reality or robotics often require
joint localisation and 6d pose estimation of multiple objects. However, most
algorithms need one network per object class to be trained in order to provide
the best results. Analysing all visible objects demands multiple inferences,
which is memory and time-consuming. We present a new single-stage architecture
called CASAPose that determines 2D-3D correspondences for pose estimation of
multiple different objects in RGB images in one pass. It is fast and memory
efficient, and achieves high accuracy for multiple objects by exploiting the
output of a semantic segmentation decoder as control input to a keypoint
recognition decoder via local class-adaptive normalisation. Our new
differentiable regression of keypoint locations significantly contributes to a
faster closing of the domain gap between real test and synthetic training data.
We apply segmentation-aware convolutions and upsampling operations to increase
the focus inside the object mask and to reduce mutual interference of occluding
objects. For each inserted object, the network grows by only one output
segmentation map and a negligible number of parameters. We outperform
state-of-the-art approaches in challenging multi-object scenes with
inter-object occlusion and synthetic training.
Related papers
- SEMPose: A Single End-to-end Network for Multi-object Pose Estimation [13.131534219937533]
SEMPose is an end-to-end multi-object pose estimation network.
It can perform inference at 32 FPS without requiring inputs other than the RGB image.
It can accurately estimate the poses of multiple objects in real time, with inference time unaffected by the number of target objects.
arXiv Detail & Related papers (2024-11-21T10:37:54Z) - CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation [3.5379836919221566]
Estimating rigid objects' poses is one of the fundamental problems in computer vision.
This paper presents a novel approach, CVAM-Pose, for multi-object monocular pose estimation.
arXiv Detail & Related papers (2024-10-11T17:26:27Z) - AF$_2$: Adaptive Focus Framework for Aerial Imagery Segmentation [86.44683367028914]
Aerial imagery segmentation has some unique challenges, the most critical one among which lies in foreground-background imbalance.
We propose Adaptive Focus Framework (AF$), which adopts a hierarchical segmentation procedure and focuses on adaptively utilizing multi-scale representations.
AF$ has significantly improved the accuracy on three widely used aerial benchmarks, as fast as the mainstream method.
arXiv Detail & Related papers (2022-02-18T10:14:45Z) - Multi-patch Feature Pyramid Network for Weakly Supervised Object
Detection in Optical Remote Sensing Images [39.25541709228373]
We propose a new architecture for object detection with a multiple patch feature pyramid network (MPFP-Net)
MPFP-Net is different from the current models that during training only pursue the most discriminative patches.
We introduce an effective method to regularize the residual values and make the fusion transition layers strictly norm-preserving.
arXiv Detail & Related papers (2021-08-18T09:25:39Z) - Associating Objects with Transformers for Video Object Segmentation [74.51719591192787]
We propose an Associating Objects with Transformers (AOT) approach to match and decode multiple objects uniformly.
AOT employs an identification mechanism to associate multiple targets into the same high-dimensional embedding space.
We ranked 1st in the 3rd Large-scale Video Object Challenge.
arXiv Detail & Related papers (2021-06-04T17:59:57Z) - Decoupled and Memory-Reinforced Networks: Towards Effective Feature
Learning for One-Step Person Search [65.51181219410763]
One-step methods have been developed to handle pedestrian detection and identification sub-tasks using a single network.
There are two major challenges in the current one-step approaches.
We propose a decoupled and memory-reinforced network (DMRNet) to overcome these problems.
arXiv Detail & Related papers (2021-02-22T06:19:45Z) - Robust Instance Segmentation through Reasoning about Multi-Object
Occlusion [9.536947328412198]
We propose a deep network for multi-object instance segmentation that is robust to occlusion.
Our work builds on Compositional Networks, which learn a generative model of neural feature activations to locate occluders.
In particular, we obtain feed-forward predictions of the object classes and their instance and occluder segmentations.
arXiv Detail & Related papers (2020-12-03T17:41:55Z) - Learning RGB-D Feature Embeddings for Unseen Object Instance
Segmentation [67.88276573341734]
We propose a new method for unseen object instance segmentation by learning RGB-D feature embeddings from synthetic data.
A metric learning loss function is utilized to learn to produce pixel-wise feature embeddings.
We further improve the segmentation accuracy with a new two-stage clustering algorithm.
arXiv Detail & Related papers (2020-07-30T00:23:07Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z) - Spatial Semantic Embedding Network: Fast 3D Instance Segmentation with
Deep Metric Learning [5.699350798684963]
We propose a simple, yet efficient algorithm for 3D instance segmentation using deep metric learning.
For high-level intelligent tasks from a large scale scene, 3D instance segmentation recognizes individual instances of objects.
We demonstrate the state-of-the-art performance of our algorithm in the ScanNet 3D instance segmentation benchmark on AP score.
arXiv Detail & Related papers (2020-07-07T02:17:44Z) - Depthwise Non-local Module for Fast Salient Object Detection Using a
Single Thread [136.2224792151324]
We propose a new deep learning algorithm for fast salient object detection.
The proposed algorithm achieves competitive accuracy and high inference efficiency simultaneously with a single CPU thread.
arXiv Detail & Related papers (2020-01-22T15:23:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.