One-Shot Object Localization Using Learnt Visual Cues via Siamese
Networks
- URL: http://arxiv.org/abs/2012.13690v1
- Date: Sat, 26 Dec 2020 07:40:00 GMT
- Title: One-Shot Object Localization Using Learnt Visual Cues via Siamese
Networks
- Authors: Sagar Gubbi Venkatesh and Bharadwaj Amrutur
- Abstract summary: In this work, a visual cue is used to specify a novel object of interest which must be localized in new environments.
An end-to-end neural network equipped with a Siamese network is used to learn the cue, infer the object of interest, and then to localize it in new environments.
- Score: 0.7832189413179361
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: A robot that can operate in novel and unstructured environments must be
capable of recognizing new, previously unseen, objects. In this work, a visual
cue is used to specify a novel object of interest which must be localized in
new environments. An end-to-end neural network equipped with a Siamese network
is used to learn the cue, infer the object of interest, and then to localize it
in new environments. We show that a simulated robot can pick-and-place novel
objects pointed to by a laser pointer. We also evaluate the performance of the
proposed approach on a dataset derived from the Omniglot handwritten character
dataset and on a small dataset of toys.
Related papers
- LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion [79.22197702626542]
This paper introduces a framework that explores amodal segmentation for robotic grasping in cluttered scenes.
We propose a Linear-fusion Attention-guided Convolutional Network (LAC-Net)
The results on different datasets show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-08-06T14:50:48Z) - Object Registration in Neural Fields [6.361537379901403]
We provide an expanded analysis of the recent Reg-NF neural field registration method and its use-cases within a robotics context.
We showcase the scenario of determining the 6-DoF pose of known objects within a scene using scene and object neural field models.
We show how this may be used to better represent objects within imperfectly modelled scenes and generate new scenes by substituting object neural field models into the scene.
arXiv Detail & Related papers (2024-04-29T02:33:40Z) - NeuPAN: Direct Point Robot Navigation with End-to-End Model-based Learning [67.53972459080437]
This paper presents NeuPAN: a real-time, highly-accurate, robot-agnostic, and environment-invariant robot navigation solution.
Leveraging a tightly-coupled perception-locomotion framework, NeuPAN has two key innovations compared to existing approaches.
We evaluate NeuPAN on car-like robot, wheel-legged robot, and passenger autonomous vehicle, in both simulated and real-world environments.
arXiv Detail & Related papers (2024-03-11T15:44:38Z) - Local Neural Descriptor Fields: Locally Conditioned Object
Representations for Manipulation [10.684104348212742]
We present a method to generalize object manipulation skills acquired from a limited number of demonstrations.
Our approach, Local Neural Descriptor Fields (L-NDF), utilizes neural descriptors defined on the local geometry of the object.
We illustrate the efficacy of our approach in manipulating novel objects in novel poses -- both in simulation and in the real world.
arXiv Detail & Related papers (2023-02-07T16:37:19Z) - INVIGORATE: Interactive Visual Grounding and Grasping in Clutter [56.00554240240515]
INVIGORATE is a robot system that interacts with human through natural language and grasps a specified object in clutter.
We train separate neural networks for object detection, for visual grounding, for question generation, and for OBR detection and grasping.
We build a partially observable Markov decision process (POMDP) that integrates the learned neural network modules.
arXiv Detail & Related papers (2021-08-25T07:35:21Z) - What Can I Do Here? Learning New Skills by Imagining Visual Affordances [128.65223577406587]
We show how generative models of possible outcomes can allow a robot to learn visual representations of affordances.
In effect, prior data is used to learn what kinds of outcomes may be possible, such that when the robot encounters an unfamiliar setting, it can sample potential outcomes from its model.
We show that visuomotor affordance learning (VAL) can be used to train goal-conditioned policies that operate on raw image inputs.
arXiv Detail & Related papers (2021-06-01T17:58:02Z) - Location-Sensitive Visual Recognition with Cross-IOU Loss [177.86369890708457]
This paper proposes a unified solution named location-sensitive network (LSNet) for object detection, instance segmentation, and pose estimation.
Based on a deep neural network as the backbone, LSNet predicts an anchor point and a set of landmarks which together define the shape of the target object.
arXiv Detail & Related papers (2021-04-11T02:17:14Z) - Where2Act: From Pixels to Actions for Articulated 3D Objects [54.19638599501286]
We extract highly localized actionable information related to elementary actions such as pushing or pulling for articulated objects with movable parts.
We propose a learning-from-interaction framework with an online data sampling strategy that allows us to train the network in simulation.
Our learned models even transfer to real-world data.
arXiv Detail & Related papers (2021-01-07T18:56:38Z) - Teaching Robots Novel Objects by Pointing at Them [1.1797787239802762]
We propose teaching a robot novel objects it has not encountered before by pointing a hand at the new object of interest.
An end-to-end neural network is used to attend to the novel object of interest indicated by the pointing hand and then to localize the object in new scenes.
We show that a robot arm can manipulate novel objects that are highlighted by pointing a hand at them.
arXiv Detail & Related papers (2020-12-25T20:01:25Z) - Learning Object-Based State Estimators for Household Robots [11.055133590909097]
We build object-based memory systems that operate on high-dimensional observations and hypotheses.
We demonstrate the system's effectiveness in maintaining memory of dynamically changing objects in both simulated environment and real images.
arXiv Detail & Related papers (2020-11-06T04:18:52Z) - Instance Segmentation of Visible and Occluded Regions for Finding and
Picking Target from a Pile of Objects [25.836334764387498]
We present a robotic system for picking a target from a pile of objects that is capable of finding and grasping the target object.
We extend an existing instance segmentation model with a novel relook' architecture, in which the model explicitly learns the inter-instance relationship.
Also, by using image synthesis, we make the system capable of handling new objects without human annotations.
arXiv Detail & Related papers (2020-01-21T12:28:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.