Related papers: You Only Look at One: Category-Level Object Representations for Pose Estimation From a Single Example

You Only Look at One: Category-Level Object Representations for Pose Estimation From a Single Example

URL: http://arxiv.org/abs/2305.12626v1
Date: Mon, 22 May 2023 01:32:24 GMT
Title: You Only Look at One: Category-Level Object Representations for Pose Estimation From a Single Example
Authors: Walter Goodwin, Ioannis Havoutis, Ingmar Posner
Abstract summary: We present a method for achieving category-level pose estimation by inspection of just a single object from a desired category. We demonstrate that our method runs in real-time, enabling a robot manipulator equipped with an RGBD sensor to perform online 6D pose estimation for novel objects.
Score: 26.866356430469757
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: In order to meaningfully interact with the world, robot manipulators must be able to interpret objects they encounter. A critical aspect of this interpretation is pose estimation: inferring quantities that describe the position and orientation of an object in 3D space. Most existing approaches to pose estimation make limiting assumptions, often working only for specific, known object instances, or at best generalising to an object category using large pose-labelled datasets. In this work, we present a method for achieving category-level pose estimation by inspection of just a single object from a desired category. We show that we can subsequently perform accurate pose estimation for unseen objects from an inspected category, and considerably outperform prior work by exploiting multi-view correspondences. We demonstrate that our method runs in real-time, enabling a robot manipulator equipped with an RGBD sensor to perform online 6D pose estimation for novel objects. Finally, we showcase our method in a continual learning setting, with a robot able to determine whether objects belong to known categories, and if not, use active perception to produce a one-shot category representation for subsequent pose estimation.

Related papers

Category-Level and Open-Set Object Pose Estimation for Robotics [7.9471205712560264]
This paper compares datasets, accuracy metrics, and algorithms for solving 6D pose estimation on the category-level. We analyze how to bridge category-level and open-set object pose estimation to reach generalization and provide actionable recommendations.
arXiv Detail & Related papers (2025-04-28T08:31:33Z)
PickScan: Object discovery and reconstruction from handheld interactions [99.99566882133179]
We develop an interaction-guided and class-agnostic method to reconstruct 3D representations of scenes. Our main contribution is a novel approach to detecting user-object interactions and extracting the masks of manipulated objects. Compared to Co-Fusion, the only comparable interaction-based and class-agnostic baseline, this corresponds to a reduction in chamfer distance of 73%.
arXiv Detail & Related papers (2024-11-17T23:09:08Z)
ICGNet: A Unified Approach for Instance-Centric Grasping [42.92991092305974]
We introduce an end-to-end architecture for object-centric grasping. We show the effectiveness of the proposed method by extensively evaluating it against state-of-the-art methods on synthetic datasets.
arXiv Detail & Related papers (2024-01-18T12:41:41Z)
LocaliseBot: Multi-view 3D object localisation with differentiable rendering for robot grasping [9.690844449175948]
We focus on object pose estimation. Our approach relies on three pieces of information: multiple views of the object, the camera's parameters at those viewpoints, and 3D CAD models of objects. We show that the estimated object pose results in 99.65% grasp accuracy with the ground truth grasp candidates.
arXiv Detail & Related papers (2023-11-14T14:27:53Z)
Object-centric Video Representation for Long-term Action Anticipation [33.115854386196126]
Key motivation is that objects provide important cues to recognize and predict human-object interactions. We propose to build object-centric video representations by leveraging visual-language pretrained models. To recognize and predict human-object interactions, we use a Transformer-based neural architecture.
arXiv Detail & Related papers (2023-10-31T22:54:31Z)
3D-Aware Hypothesis & Verification for Generalizable Relative Object Pose Estimation [69.73691477825079]
We present a new hypothesis-and-verification framework to tackle the problem of generalizable object pose estimation. To measure reliability, we introduce a 3D-aware verification that explicitly applies 3D transformations to the 3D object representations learned from the two input images.
arXiv Detail & Related papers (2023-10-05T13:34:07Z)
ShapeShift: Superquadric-based Object Pose Estimation for Robotic Grasping [85.38689479346276]
Current techniques heavily rely on a reference 3D object, limiting their generalizability and making it expensive to expand to new object categories. This paper proposes ShapeShift, a superquadric-based framework for object pose estimation that predicts the object's pose relative to a primitive shape which is fitted to the object.
arXiv Detail & Related papers (2023-04-10T20:55:41Z)
NOPE: Novel Object Pose Estimation from a Single Image [67.11073133072527]
We propose an approach that takes a single image of a new object as input and predicts the relative pose of this object in new images without prior knowledge of the object's 3D model. We achieve this by training a model to directly predict discriminative embeddings for viewpoints surrounding the object. This prediction is done using a simple U-Net architecture with attention and conditioned on the desired pose, which yields extremely fast inference.
arXiv Detail & Related papers (2023-03-23T18:55:43Z)
Object Manipulation via Visual Target Localization [64.05939029132394]
Training agents to manipulate objects, poses many challenges. We propose an approach that explores the environment in search for target objects, computes their 3D coordinates once they are located, and then continues to estimate their 3D locations even when the objects are not visible. Our evaluations show a massive 3x improvement in success rate over a model that has access to the same sensory suite.
arXiv Detail & Related papers (2022-03-15T17:59:01Z)
Continuous close-range 3D object pose estimation [1.4502611532302039]
Vision-based 3D pose estimation is a necessity to accurately handle objects that might not be placed at fixed positions. In this paper, we present a 3D pose estimation method based on a gradient-ascend particle filter. Thereby, we can apply this method online during task execution to save valuable cycle time.
arXiv Detail & Related papers (2020-10-02T07:48:17Z)
Single View Metrology in the Wild [94.7005246862618]
We present a novel approach to single view metrology that can recover the absolute scale of a scene represented by 3D heights of objects or camera height above the ground. Our method relies on data-driven priors learned by a deep network specifically designed to imbibe weakly supervised constraints from the interplay of the unknown camera with 3D entities such as object heights. We demonstrate state-of-the-art qualitative and quantitative results on several datasets as well as applications including virtual object insertion.
arXiv Detail & Related papers (2020-07-18T22:31:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.