Related papers: Zero123-6D: Zero-shot Novel View Synthesis for RGB Category-level 6D Pose Estimation

Zero123-6D: Zero-shot Novel View Synthesis for RGB Category-level 6D Pose Estimation

URL: http://arxiv.org/abs/2403.14279v1
Date: Thu, 21 Mar 2024 10:38:18 GMT
Title: Zero123-6D: Zero-shot Novel View Synthesis for RGB Category-level 6D Pose Estimation
Authors: Francesco Di Felice, Alberto Remus, Stefano Gasperini, Benjamin Busam, Lionel Ott, Federico Tombari, Roland Siegwart, Carlo Alberto Avizzano,
Abstract summary: This work shows the utility of Diffusion Model-based novel-view-synthesizers in enhancing RGB 6D pose estimation at category-level. Experiments are quantitatively analyzed on the CO3D dataset, showcasing increased performance over baselines.
Score: 66.3814684757376
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Estimating the pose of objects through vision is essential to make robotic platforms interact with the environment. Yet, it presents many challenges, often related to the lack of flexibility and generalizability of state-of-the-art solutions. Diffusion models are a cutting-edge neural architecture transforming 2D and 3D computer vision, outlining remarkable performances in zero-shot novel-view synthesis. Such a use case is particularly intriguing for reconstructing 3D objects. However, localizing objects in unstructured environments is rather unexplored. To this end, this work presents Zero123-6D to demonstrate the utility of Diffusion Model-based novel-view-synthesizers in enhancing RGB 6D pose estimation at category-level by integrating them with feature extraction techniques. The outlined method exploits such a novel view synthesizer to expand a sparse set of RGB-only reference views for the zero-shot 6D pose estimation task. Experiments are quantitatively analyzed on the CO3D dataset, showcasing increased performance over baselines, a substantial reduction in data requirements, and the removal of the necessity of depth information.

Related papers

Advances in Feed-Forward 3D Reconstruction and View Synthesis: A Survey [154.50661618628433]
3D reconstruction and view synthesis are foundational problems in computer vision, graphics, and immersive technologies such as augmented reality (AR), virtual reality (VR), and digital twins.<n>Recent advances in feed-forward approaches, driven by deep learning, have revolutionized this field by enabling fast and generalizable 3D reconstruction and view synthesis.
arXiv Detail & Related papers (2025-07-19T06:13:25Z)
Mixed Diffusion for 3D Indoor Scene Synthesis [55.94569112629208]
We present MiDiffusion, a novel mixed discrete-continuous diffusion model architecture. We represent a scene layout by a 2D floor plan and a set of objects, each defined by its category, location, size, and orientation. Our experimental results demonstrate that MiDiffusion substantially outperforms state-of-the-art autoregressive and diffusion models in floor-conditioned 3D scene synthesis.
arXiv Detail & Related papers (2024-05-31T17:54:52Z)
Robust 3D Shape Reconstruction in Zero-Shot from a Single Image in the Wild [22.82439286651921]
We propose a unified regression model that integrates segmentation and reconstruction, specifically designed for 3D shape reconstruction. We also introduce a scalable data synthesis pipeline that simulates a wide range of variations in objects, occluders, and backgrounds. Our training on our synthetic data enables the proposed model to achieve state-of-the-art zero-shot results on real-world images.
arXiv Detail & Related papers (2024-03-21T16:40:10Z)
FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with Pre-trained Vision-Language Models [59.13757801286343]
Few-shot class-incremental learning aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data. We introduce the FILP-3D framework with two novel components: the Redundant Feature Eliminator (RFE) for feature space misalignment and the Spatial Noise Compensator (SNC) for significant noise.
arXiv Detail & Related papers (2023-12-28T14:52:07Z)
Robust Category-Level 3D Pose Estimation from Synthetic Data [17.247607850702558]
We introduce SyntheticP3D, a new synthetic dataset for object pose estimation generated from CAD models. We propose a novel approach (CC3D) for training neural mesh models that perform pose estimation via inverse rendering.
arXiv Detail & Related papers (2023-05-25T14:56:03Z)
Uncertainty Guided Policy for Active Robotic 3D Reconstruction using Neural Radiance Fields [82.21033337949757]
This paper introduces a ray-based volumetric uncertainty estimator, which computes the entropy of the weight distribution of the color samples along each ray of the object's implicit neural representation. We show that it is possible to infer the uncertainty of the underlying 3D geometry given a novel view with the proposed estimator. We present a next-best-view selection policy guided by the ray-based volumetric uncertainty in neural radiance fields-based representations.
arXiv Detail & Related papers (2022-09-17T21:28:57Z)
Robust Category-Level 6D Pose Estimation with Coarse-to-Fine Rendering of Neural Features [17.920305227880245]
We consider the problem of category-level 6D pose estimation from a single RGB image. Our approach represents an object category as a cuboid mesh and learns a generative model of the neural feature activations at each mesh. Our experiments demonstrate an enhanced category-level 6D pose estimation performance compared to prior work.
arXiv Detail & Related papers (2022-09-12T21:31:36Z)
RandomRooms: Unsupervised Pre-training from Synthetic Shapes and Randomized Layouts for 3D Object Detection [138.2892824662943]
A promising solution is to make better use of the synthetic dataset, which consists of CAD object models, to boost the learning on real datasets. Recent work on 3D pre-training exhibits failure when transfer features learned on synthetic objects to other real-world applications. In this work, we put forward a new method called RandomRooms to accomplish this objective.
arXiv Detail & Related papers (2021-08-17T17:56:12Z)
Unsupervised Learning of 3D Object Categories from Videos in the Wild [75.09720013151247]
We focus on learning a model from multiple views of a large collection of object instances. We propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction. Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on existing benchmarks.
arXiv Detail & Related papers (2021-03-30T17:57:01Z)
Secrets of 3D Implicit Object Shape Reconstruction in the Wild [92.5554695397653]
Reconstructing high-fidelity 3D objects from sparse, partial observation is crucial for various applications in computer vision, robotics, and graphics. Recent neural implicit modeling methods show promising results on synthetic or dense datasets. But, they perform poorly on real-world data that is sparse and noisy. This paper analyzes the root cause of such deficient performance of a popular neural implicit model.
arXiv Detail & Related papers (2021-01-18T03:24:48Z)
se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic Domains [12.71983073907091]
This work proposes a data-driven optimization approach for long-term, 6D pose tracking. It aims to identify the optimal relative pose given the current RGB-D observation and a synthetic image conditioned on the previous best estimate and the object's model. The proposed approach achieves consistently robust estimates and outperforms alternatives, even though they have been trained with real images.
arXiv Detail & Related papers (2020-07-27T21:09:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.