Related papers: Task-Focused Few-Shot Object Detection for Robot Manipulation

Task-Focused Few-Shot Object Detection for Robot Manipulation

URL: http://arxiv.org/abs/2201.12437v1
Date: Fri, 28 Jan 2022 21:52:05 GMT
Title: Task-Focused Few-Shot Object Detection for Robot Manipulation
Authors: Brent Griffin
Abstract summary: We develop a manipulation method based solely on detection then introduce task-focused few-shot object detection to learn new objects and settings. In experiments for our interactive approach to few-shot learning, we train a robot to manipulate objects directly from detection (ClickBot)
Score: 1.8275108630751844
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper addresses the problem of mobile robot manipulation of novel objects via detection. Our approach uses vision and control as complementary functions that learn from real-world tasks. We develop a manipulation method based solely on detection then introduce task-focused few-shot object detection to learn new objects and settings. The current paradigm for few-shot object detection uses existing annotated examples. In contrast, we extend this paradigm by using active data collection and annotation selection that improves performance for specific downstream tasks (e.g., depth estimation and grasping). In experiments for our interactive approach to few-shot learning, we train a robot to manipulate objects directly from detection (ClickBot). ClickBot learns visual servo control from a single click of annotation, grasps novel objects in clutter and other settings, and achieves state-of-the-art results on an existing visual servo control and depth estimation benchmark. Finally, we establish a task-focused few-shot object detection benchmark to support future research: https://github.com/griffbr/TFOD.

Related papers

A Training-Free Framework for Precise Mobile Manipulation of Small Everyday Objects [16.018172627950857]
We develop a closed-loop training-free framework that enables a mobile manipulator to tackle precise tasks involving the manipulation of small objects. SVM employs an RGB-D wrist camera and uses visual servoing for control. We demonstrate that open-vocabulary object detectors can serve as a drop-in module to identify semantic targets.
arXiv Detail & Related papers (2025-02-19T18:59:17Z)
Click to Grasp: Zero-Shot Precise Manipulation via Visual Diffusion Descriptors [30.579707929061026]
Our work explores the grounding of fine-grained part descriptors for precise manipulation in a zero-shot setting. We tackle the problem by framing it as a dense semantic part correspondence task. Our model returns a gripper pose for manipulating a specific part, using as reference a user-defined click from a source image of a visually different instance of the same object.
arXiv Detail & Related papers (2024-03-21T16:26:19Z)
Few-shot Oriented Object Detection with Memorable Contrastive Learning in Remote Sensing Images [11.217630579076237]
Few-shot object detection (FSOD) has garnered significant research attention in the field of remote sensing. We propose a novel FSOD method for remote sensing images called Few-shot Oriented object detection with Memorable Contrastive learning (FOMC) Specifically, we employ oriented bounding boxes instead of traditional horizontal bounding boxes to learn a better feature representation for arbitrary-oriented aerial objects.
arXiv Detail & Related papers (2024-03-20T08:15:18Z)
SalienDet: A Saliency-based Feature Enhancement Algorithm for Object Detection for Autonomous Driving [160.57870373052577]
We propose a saliency-based OD algorithm (SalienDet) to detect unknown objects. Our SalienDet utilizes a saliency-based algorithm to enhance image features for object proposal generation. We design a dataset relabeling approach to differentiate the unknown objects from all objects in training sample set to achieve Open-World Detection.
arXiv Detail & Related papers (2023-05-11T16:19:44Z)
Tactile-Filter: Interactive Tactile Perception for Part Mating [54.46221808805662]
Humans rely on touch and tactile sensing for a lot of dexterous manipulation tasks. vision-based tactile sensors are being widely used for various robotic perception and control tasks. We present a method for interactive perception using vision-based tactile sensors for a part mating task.
arXiv Detail & Related papers (2023-03-10T16:27:37Z)
Object Manipulation via Visual Target Localization [64.05939029132394]
Training agents to manipulate objects, poses many challenges. We propose an approach that explores the environment in search for target objects, computes their 3D coordinates once they are located, and then continues to estimate their 3D locations even when the objects are not visible. Our evaluations show a massive 3x improvement in success rate over a model that has access to the same sensory suite.
arXiv Detail & Related papers (2022-03-15T17:59:01Z)
Semantically Grounded Object Matching for Robust Robotic Scene Rearrangement [21.736603698556042]
We present a novel approach to object matching that uses a large pre-trained vision-language model to match objects in a cross-instance setting. We demonstrate that this provides considerably improved matching performance in cross-instance settings.
arXiv Detail & Related papers (2021-11-15T18:39:43Z)
A Survey of Self-Supervised and Few-Shot Object Detection [19.647681501581225]
Self-supervised methods aim at learning representations from unlabeled data which transfer well to downstream tasks such as object detection. Few-shot object detection is about training a model on novel (unseen) object classes with little data. In this survey, we review and characterize the most recent approaches on few-shot and self-supervised object detection.
arXiv Detail & Related papers (2021-10-27T18:55:47Z)
One-Shot Object Affordance Detection in the Wild [76.46484684007706]
Affordance detection refers to identifying the potential action possibilities of objects in an image. We devise a One-Shot Affordance Detection Network (OSAD-Net) that estimates the human action purpose and then transfers it to help detect the common affordance from all candidate images. With complex scenes and rich annotations, our PADv2 dataset can be used as a test bed to benchmark affordance detection methods.
arXiv Detail & Related papers (2021-08-08T14:53:10Z)
Learning to See before Learning to Act: Visual Pre-training for Manipulation [48.731528716324355]
We find that pre-training on vision tasks significantly improves generalization and sample efficiency for learning to manipulate objects. We explore directly transferring model parameters from vision networks to affordance prediction networks, and show that this can result in successful zero-shot adaptation. With just a small amount of robotic experience, we can further fine-tune the affordance model to achieve better results.
arXiv Detail & Related papers (2021-07-01T17:58:37Z)
Few-shot Weakly-Supervised Object Detection via Directional Statistics [55.97230224399744]
We propose a probabilistic multiple instance learning approach for few-shot Common Object Localization (COL) and few-shot Weakly Supervised Object Detection (WSOD) Our model simultaneously learns the distribution of the novel objects and localizes them via expectation-maximization steps. Our experiments show that the proposed method, despite being simple, outperforms strong baselines in few-shot COL and WSOD, as well as large-scale WSOD tasks.
arXiv Detail & Related papers (2021-03-25T22:34:16Z)
Leveraging Bottom-Up and Top-Down Attention for Few-Shot Object Detection [31.1548809359908]
Few-shot object detection aims at detecting objects with few annotated examples. We propose an attentive few-shot object detection network (AttFDNet) that takes the advantages of both top-down and bottom-up attention.
arXiv Detail & Related papers (2020-07-23T16:12:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.