A Visual Navigation Perspective for Category-Level Object Pose
Estimation
- URL: http://arxiv.org/abs/2203.13572v1
- Date: Fri, 25 Mar 2022 10:57:37 GMT
- Title: A Visual Navigation Perspective for Category-Level Object Pose
Estimation
- Authors: Jiaxin Guo, Fangxun Zhong, Rong Xiong, Yunhui Liu, Yue Wang, Yiyi Liao
- Abstract summary: This paper studies category-level object pose estimation based on a single monocular image.
Recent advances in pose-aware generative models have paved the way for addressing this challenging task using analysis-by-synthesis.
- Score: 41.60364392204057
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies category-level object pose estimation based on a single
monocular image. Recent advances in pose-aware generative models have paved the
way for addressing this challenging task using analysis-by-synthesis. The idea
is to sequentially update a set of latent variables, e.g., pose, shape, and
appearance, of the generative model until the generated image best agrees with
the observation. However, convergence and efficiency are two challenges of this
inference procedure. In this paper, we take a deeper look at the inference of
analysis-by-synthesis from the perspective of visual navigation, and
investigate what is a good navigation policy for this specific task. We
evaluate three different strategies, including gradient descent, reinforcement
learning and imitation learning, via thorough comparisons in terms of
convergence, robustness and efficiency. Moreover, we show that a simple hybrid
approach leads to an effective and efficient solution. We further compare these
strategies to state-of-the-art methods, and demonstrate superior performance on
synthetic and real-world datasets leveraging off-the-shelf pose-aware
generative models.
Related papers
- Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization.
We introduce a benchmark comprising eight different synthetic and real-world datasets.
We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z) - Zero-Shot Image Harmonization with Generative Model Prior [22.984119094424056]
We propose a zero-shot approach to image harmonization, aiming to overcome the reliance on large amounts of synthetic composite images.
We introduce a fully modularized framework inspired by human behavior.
We present compelling visual results across diverse scenes and objects, along with a user study validating our approach.
arXiv Detail & Related papers (2023-07-17T00:56:21Z) - IRGen: Generative Modeling for Image Retrieval [82.62022344988993]
In this paper, we present a novel methodology, reframing image retrieval as a variant of generative modeling.
We develop our model, dubbed IRGen, to address the technical challenge of converting an image into a concise sequence of semantic units.
Our model achieves state-of-the-art performance on three widely-used image retrieval benchmarks and two million-scale datasets.
arXiv Detail & Related papers (2023-03-17T17:07:36Z) - CroCo v2: Improved Cross-view Completion Pre-training for Stereo
Matching and Optical Flow [22.161967080759993]
Self-supervised pre-training methods have not yet delivered on dense geometric vision tasks such as stereo matching or optical flow.
We build on the recent cross-view completion framework, a variation of masked image modeling that leverages a second view from the same scene.
We show for the first time that state-of-the-art results on stereo matching and optical flow can be reached without using any classical task-specific techniques.
arXiv Detail & Related papers (2022-11-18T18:18:53Z) - Robust Single Image Dehazing Based on Consistent and Contrast-Assisted
Reconstruction [95.5735805072852]
We propose a novel density-variational learning framework to improve the robustness of the image dehzing model.
Specifically, the dehazing network is optimized under the consistency-regularized framework.
Our method significantly surpasses the state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-29T08:11:04Z) - Fusing Local Similarities for Retrieval-based 3D Orientation Estimation
of Unseen Objects [70.49392581592089]
We tackle the task of estimating the 3D orientation of previously-unseen objects from monocular images.
We follow a retrieval-based strategy and prevent the network from learning object-specific features.
Our experiments on the LineMOD, LineMOD-Occluded, and T-LESS datasets show that our method yields a significantly better generalization to unseen objects than previous works.
arXiv Detail & Related papers (2022-03-16T08:53:00Z) - CoSformer: Detecting Co-Salient Object with Transformers [2.3148470932285665]
Co-Salient Object Detection (CoSOD) aims at simulating the human visual system to discover the common and salient objects from a group of relevant images.
We propose the Co-Salient Object Detection Transformer (CoSformer) network to capture both salient and common visual patterns from multiple images.
arXiv Detail & Related papers (2021-04-30T02:39:12Z) - Deep Graph Contrastive Representation Learning [23.37786673825192]
We propose a novel framework for unsupervised graph representation learning by leveraging a contrastive objective at the node level.
Specifically, we generate two graph views by corruption and learn node representations by maximizing the agreement of node representations in these two views.
We perform empirical experiments on both transductive and inductive learning tasks using a variety of real-world datasets.
arXiv Detail & Related papers (2020-06-07T11:50:45Z) - Neural Topological SLAM for Visual Navigation [112.73876869904]
We design topological representations for space that leverage semantics and afford approximate geometric reasoning.
We describe supervised learning-based algorithms that can build, maintain and use such representations under noisy actuation.
arXiv Detail & Related papers (2020-05-25T17:56:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.