Related papers: Finding NeMO: A Geometry-Aware Representation of Template Views for Few-Shot Perception

Finding NeMO: A Geometry-Aware Representation of Template Views for Few-Shot Perception

URL: http://arxiv.org/abs/2602.04343v1
Date: Wed, 04 Feb 2026 09:12:05 GMT
Title: Finding NeMO: A Geometry-Aware Representation of Template Views for Few-Shot Perception
Authors: Sebastian Jung, Leonard Klüpfel, Rudolph Triebel, Maximilian Durner,
Abstract summary: We present a novel object-centric representation that can be used to detect, segment and estimate the 6DoF pose of objects unseen during training using RGB images.<n>Our method consists of an encoder that requires only a few RGB template views depicting an object to generate a sparse object-like point cloud.<n>Next, a decoder takes the object encoding together with a query image to generate a variety of dense predictions.
Score: 9.145558382187524
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present Neural Memory Object (NeMO), a novel object-centric representation that can be used to detect, segment and estimate the 6DoF pose of objects unseen during training using RGB images. Our method consists of an encoder that requires only a few RGB template views depicting an object to generate a sparse object-like point cloud using a learned UDF containing semantic and geometric information. Next, a decoder takes the object encoding together with a query image to generate a variety of dense predictions. Through extensive experiments, we show that our method can be used for few-shot object perception without requiring any camera-specific parameters or retraining on target data. Our proposed concept of outsourcing object information in a NeMO and using a single network for multiple perception tasks enhances interaction with novel objects, improving scalability and efficiency by enabling quick object onboarding without retraining or extensive pre-processing. We report competitive and state-of-the-art results on various datasets and perception tasks of the BOP benchmark, demonstrating the versatility of our approach. https://github.com/DLR-RM/nemo

Related papers

One2Any: One-Reference 6D Pose Estimation for Any Object [98.50085481362808]
6D object pose estimation remains challenging for many applications due to dependencies on complete 3D models, multi-view images, or training limited to specific object categories.<n>We propose a novel method One2Any that estimates the relative 6-degrees of freedom (DOF) object pose using only a single reference-single query RGB-D image.<n> Experiments on multiple benchmark datasets demonstrate that our model generalizes well to novel objects, achieving state-of-the-art accuracy and even rivaling methods that require multi-view or CAD inputs, at a fraction of compute.
arXiv Detail & Related papers (2025-05-07T03:54:59Z)
AnyDoor: Zero-shot Object-level Image Customization [63.44307304097742]
This work presents AnyDoor, a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations. Our model is trained only once and effortlessly generalizes to diverse object-scene combinations at the inference stage.
arXiv Detail & Related papers (2023-07-18T17:59:02Z)
Object Re-Identification from Point Clouds [3.6308236424346694]
We provide the first large-scale study of object ReID from point clouds and establish its performance relative to image ReID. To our knowledge, we are the first to study object re-identification from real point cloud observations.
arXiv Detail & Related papers (2023-05-17T13:43:03Z)
MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training. We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects. Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z)
FewSOL: A Dataset for Few-Shot Object Learning in Robotic Environments [21.393674766169543]
We introduce the Few-Shot Object Learning dataset for object recognition with a few images per object. We captured 336 real-world objects with 9 RGB-D images per object from different views. The evaluation results show that there is still a large margin to be improved for few-shot object classification in robotic environments.
arXiv Detail & Related papers (2022-07-06T05:57:24Z)
Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets. This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets. We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z)
Supervised Training of Dense Object Nets using Optimal Descriptors for Industrial Robotic Applications [57.87136703404356]
Dense Object Nets (DONs) by Florence, Manuelli and Tedrake introduced dense object descriptors as a novel visual object representation for the robotics community. In this paper we show that given a 3D model of an object, we can generate its descriptor space image, which allows for supervised training of DONs. We compare the training methods on generating 6D grasps for industrial objects and show that our novel supervised training approach improves the pick-and-place performance in industry-relevant tasks.
arXiv Detail & Related papers (2021-02-16T11:40:12Z)
Rapid Pose Label Generation through Sparse Representation of Unknown Objects [7.32172860877574]
This work presents an approach for rapidly generating real-world, pose-annotated RGB-D data for unknown objects. We first source minimalistic labelings of an ordered set of arbitrarily chosen keypoints over a set of RGB-D videos. By solving an optimization problem, we combine these labels under a world frame to recover a sparse, keypoint-based representation of the object.
arXiv Detail & Related papers (2020-11-07T15:14:03Z)
Contour Primitive of Interest Extraction Network Based on One-Shot Learning for Object-Agnostic Vision Measurement [37.552192926136065]
We propose the contour primitive of interest extraction network (CPieNet) based on the one-shot learning framework. For the novel CPI extraction task, we built the Object Contour Primitives dataset using online public images, and the Robotic Object Contour Measurement dataset using a camera mounted on a robot.
arXiv Detail & Related papers (2020-10-07T11:00:30Z)
Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects. Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity. We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.