MoreFusion: Multi-object Reasoning for 6D Pose Estimation from
Volumetric Fusion
- URL: http://arxiv.org/abs/2004.04336v1
- Date: Thu, 9 Apr 2020 02:29:30 GMT
- Title: MoreFusion: Multi-object Reasoning for 6D Pose Estimation from
Volumetric Fusion
- Authors: Kentaro Wada, Edgar Sucar, Stephen James, Daniel Lenton, Andrew J.
Davison
- Abstract summary: We present a system which can estimate the accurate poses of multiple known objects in contact and occlusion from real-time, embodied multi-view vision.
Our approach makes 3D object pose proposals from single RGB-D views, accumulates pose estimates and non-parametric occupancy information from multiple views as the camera moves.
We verify the accuracy and robustness of our approach experimentally on 2 object datasets: YCB-Video, and our own challenging Cluttered YCB-Video.
- Score: 19.034317851914725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robots and other smart devices need efficient object-based scene
representations from their on-board vision systems to reason about contact,
physics and occlusion. Recognized precise object models will play an important
role alongside non-parametric reconstructions of unrecognized structures. We
present a system which can estimate the accurate poses of multiple known
objects in contact and occlusion from real-time, embodied multi-view vision.
Our approach makes 3D object pose proposals from single RGB-D views,
accumulates pose estimates and non-parametric occupancy information from
multiple views as the camera moves, and performs joint optimization to estimate
consistent, non-intersecting poses for multiple objects in contact.
We verify the accuracy and robustness of our approach experimentally on 2
object datasets: YCB-Video, and our own challenging Cluttered YCB-Video. We
demonstrate a real-time robotics application where a robot arm precisely and
orderly disassembles complicated piles of objects, using only on-board RGB-D
vision.
Related papers
- SM$^3$: Self-Supervised Multi-task Modeling with Multi-view 2D Images
for Articulated Objects [24.737865259695006]
We propose a self-supervised interaction perception method, referred to as SM$3$, to model articulated objects.
By constructing 3D geometries and textures from the captured 2D images, SM$3$ achieves integrated optimization of movable part and joint parameters.
Evaluations demonstrate that SM$3$ surpasses existing benchmarks across various categories and objects, while its adaptability in real-world scenarios has been thoroughly validated.
arXiv Detail & Related papers (2024-01-17T11:15:09Z) - Fit-NGP: Fitting Object Models to Neural Graphics Primitives [19.513102875891775]
We show that the density field created by a state-of-the-art efficient radiance field reconstruction method is suitable for highly accurate pose estimation.
We present a fully automatic object pose estimation system based on a robot arm with a single wrist-mounted camera.
arXiv Detail & Related papers (2024-01-04T16:57:56Z) - Transformer Network for Multi-Person Tracking and Re-Identification in
Unconstrained Environment [0.6798775532273751]
Multi-object tracking (MOT) has profound applications in a variety of fields, including surveillance, sports analytics, self-driving, and cooperative robotics.
We put forward an integrated MOT method that marries object detection and identity linkage within a singular, end-to-end trainable framework.
Our system leverages a robust memory-temporal memory module that retains extensive historical observations and effectively encodes them using an attention-based aggregator.
arXiv Detail & Related papers (2023-12-19T08:15:22Z) - ROAM: Robust and Object-Aware Motion Generation Using Neural Pose
Descriptors [73.26004792375556]
This paper shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object.
We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object.
We demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects.
arXiv Detail & Related papers (2023-08-24T17:59:51Z) - Multi-Modal Dataset Acquisition for Photometrically Challenging Object [56.30027922063559]
This paper addresses the limitations of current datasets for 3D vision tasks in terms of accuracy, size, realism, and suitable imaging modalities for photometrically challenging objects.
We propose a novel annotation and acquisition pipeline that enhances existing 3D perception and 6D object pose datasets.
arXiv Detail & Related papers (2023-08-21T10:38:32Z) - DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention [50.11672196146829]
3D object detection with surround-view images is an essential task for autonomous driving.
We propose DETR4D, a Transformer-based framework that explores sparse attention and direct feature query for 3D object detection in multi-view images.
arXiv Detail & Related papers (2022-12-15T14:18:47Z) - 6-DoF Pose Estimation of Household Objects for Robotic Manipulation: An
Accessible Dataset and Benchmark [17.493403705281008]
We present a new dataset for 6-DoF pose estimation of known objects, with a focus on robotic manipulation research.
We provide 3D scanned textured models of toy grocery objects, as well as RGBD images of the objects in challenging, cluttered scenes.
Using semi-automated RGBD-to-model texture correspondences, the images are annotated with ground truth poses that were verified empirically to be accurate to within a few millimeters.
We also propose a new pose evaluation metric called ADD-H based upon the Hungarian assignment algorithm that is robust to symmetries in object geometry without requiring their explicit enumeration.
arXiv Detail & Related papers (2022-03-11T01:19:04Z) - Estimating 3D Motion and Forces of Human-Object Interactions from
Internet Videos [49.52070710518688]
We introduce a method to reconstruct the 3D motion of a person interacting with an object from a single RGB video.
Our method estimates the 3D poses of the person together with the object pose, the contact positions and the contact forces on the human body.
arXiv Detail & Related papers (2021-11-02T13:40:18Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - Nothing But Geometric Constraints: A Model-Free Method for Articulated
Object Pose Estimation [89.82169646672872]
We propose an unsupervised vision-based system to estimate the joint configurations of the robot arm from a sequence of RGB or RGB-D images without knowing the model a priori.
We combine a classical geometric formulation with deep learning and extend the use of epipolar multi-rigid-body constraints to solve this task.
arXiv Detail & Related papers (2020-11-30T20:46:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.