Related papers: CASAPose: Class-Adaptive and Semantic-Aware Multi-Object Pose Estimation

CASAPose: Class-Adaptive and Semantic-Aware Multi-Object Pose Estimation

URL: http://arxiv.org/abs/2210.05318v1
Date: Tue, 11 Oct 2022 10:20:01 GMT
Title: CASAPose: Class-Adaptive and Semantic-Aware Multi-Object Pose Estimation
Authors: Niklas Gard, Anna Hilsmann, Peter Eisert
Abstract summary: We present a new single-stage architecture called CASAPose. It determines 2D-3D correspondences for pose estimation of multiple different objects in RGB images in one pass. It is fast and memory efficient, and achieves high accuracy for multiple objects.
Score: 2.861848675707602
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Applications in the field of augmented reality or robotics often require joint localisation and 6d pose estimation of multiple objects. However, most algorithms need one network per object class to be trained in order to provide the best results. Analysing all visible objects demands multiple inferences, which is memory and time-consuming. We present a new single-stage architecture called CASAPose that determines 2D-3D correspondences for pose estimation of multiple different objects in RGB images in one pass. It is fast and memory efficient, and achieves high accuracy for multiple objects by exploiting the output of a semantic segmentation decoder as control input to a keypoint recognition decoder via local class-adaptive normalisation. Our new differentiable regression of keypoint locations significantly contributes to a faster closing of the domain gap between real test and synthetic training data. We apply segmentation-aware convolutions and upsampling operations to increase the focus inside the object mask and to reduce mutual interference of occluding objects. For each inserted object, the network grows by only one output segmentation map and a negligible number of parameters. We outperform state-of-the-art approaches in challenging multi-object scenes with inter-object occlusion and synthetic training.

Related papers

Finding NeMO: A Geometry-Aware Representation of Template Views for Few-Shot Perception [9.145558382187524]
We present a novel object-centric representation that can be used to detect, segment and estimate the 6DoF pose of objects unseen during training using RGB images.<n>Our method consists of an encoder that requires only a few RGB template views depicting an object to generate a sparse object-like point cloud.<n>Next, a decoder takes the object encoding together with a query image to generate a variety of dense predictions.
arXiv Detail & Related papers (2026-02-04T09:12:05Z)
SEMPose: A Single End-to-end Network for Multi-object Pose Estimation [13.131534219937533]
SEMPose is an end-to-end multi-object pose estimation network. It can perform inference at 32 FPS without requiring inputs other than the RGB image. It can accurately estimate the poses of multiple objects in real time, with inference time unaffected by the number of target objects.
arXiv Detail & Related papers (2024-11-21T10:37:54Z)
CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation [3.5379836919221566]
Estimating rigid objects' poses is one of the fundamental problems in computer vision. This paper presents a novel approach, CVAM-Pose, for multi-object monocular pose estimation.
arXiv Detail & Related papers (2024-10-11T17:26:27Z)
AF$_2$: Adaptive Focus Framework for Aerial Imagery Segmentation [86.44683367028914]
Aerial imagery segmentation has some unique challenges, the most critical one among which lies in foreground-background imbalance. We propose Adaptive Focus Framework (AF$), which adopts a hierarchical segmentation procedure and focuses on adaptively utilizing multi-scale representations. AF$ has significantly improved the accuracy on three widely used aerial benchmarks, as fast as the mainstream method.
arXiv Detail & Related papers (2022-02-18T10:14:45Z)
Multi-patch Feature Pyramid Network for Weakly Supervised Object Detection in Optical Remote Sensing Images [39.25541709228373]
We propose a new architecture for object detection with a multiple patch feature pyramid network (MPFP-Net) MPFP-Net is different from the current models that during training only pursue the most discriminative patches. We introduce an effective method to regularize the residual values and make the fusion transition layers strictly norm-preserving.
arXiv Detail & Related papers (2021-08-18T09:25:39Z)
Associating Objects with Transformers for Video Object Segmentation [74.51719591192787]
We propose an Associating Objects with Transformers (AOT) approach to match and decode multiple objects uniformly. AOT employs an identification mechanism to associate multiple targets into the same high-dimensional embedding space. We ranked 1st in the 3rd Large-scale Video Object Challenge.
arXiv Detail & Related papers (2021-06-04T17:59:57Z)
Decoupled and Memory-Reinforced Networks: Towards Effective Feature Learning for One-Step Person Search [65.51181219410763]
One-step methods have been developed to handle pedestrian detection and identification sub-tasks using a single network. There are two major challenges in the current one-step approaches. We propose a decoupled and memory-reinforced network (DMRNet) to overcome these problems.
arXiv Detail & Related papers (2021-02-22T06:19:45Z)
Robust Instance Segmentation through Reasoning about Multi-Object Occlusion [9.536947328412198]
We propose a deep network for multi-object instance segmentation that is robust to occlusion. Our work builds on Compositional Networks, which learn a generative model of neural feature activations to locate occluders. In particular, we obtain feed-forward predictions of the object classes and their instance and occluder segmentations.
arXiv Detail & Related papers (2020-12-03T17:41:55Z)
Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation [67.88276573341734]
We propose a new method for unseen object instance segmentation by learning RGB-D feature embeddings from synthetic data. A metric learning loss function is utilized to learn to produce pixel-wise feature embeddings. We further improve the segmentation accuracy with a new two-stage clustering algorithm.
arXiv Detail & Related papers (2020-07-30T00:23:07Z)
Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels. To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit. Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z)
Spatial Semantic Embedding Network: Fast 3D Instance Segmentation with Deep Metric Learning [5.699350798684963]
We propose a simple, yet efficient algorithm for 3D instance segmentation using deep metric learning. For high-level intelligent tasks from a large scale scene, 3D instance segmentation recognizes individual instances of objects. We demonstrate the state-of-the-art performance of our algorithm in the ScanNet 3D instance segmentation benchmark on AP score.
arXiv Detail & Related papers (2020-07-07T02:17:44Z)
Depthwise Non-local Module for Fast Salient Object Detection Using a Single Thread [136.2224792151324]
We propose a new deep learning algorithm for fast salient object detection. The proposed algorithm achieves competitive accuracy and high inference efficiency simultaneously with a single CPU thread.
arXiv Detail & Related papers (2020-01-22T15:23:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.