Few-Shot Visual Grounding for Natural Human-Robot Interaction
- URL: http://arxiv.org/abs/2103.09720v1
- Date: Wed, 17 Mar 2021 15:24:02 GMT
- Title: Few-Shot Visual Grounding for Natural Human-Robot Interaction
- Authors: Giorgos Tziafas and Hamidreza Kasaei
- Abstract summary: We propose a software architecture that segments a target object from a crowded scene, indicated verbally by a human user.
At the core of our system, we employ a multi-modal deep neural network for visual grounding.
We evaluate the performance of the proposed model on real RGB-D data collected from public scene datasets.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Natural Human-Robot Interaction (HRI) is one of the key components for
service robots to be able to work in human-centric environments. In such
dynamic environments, the robot needs to understand the intention of the user
to accomplish a task successfully. Towards addressing this point, we propose a
software architecture that segments a target object from a crowded scene,
indicated verbally by a human user. At the core of our system, we employ a
multi-modal deep neural network for visual grounding. Unlike most grounding
methods that tackle the challenge using pre-trained object detectors via a
two-stepped process, we develop a single stage zero-shot model that is able to
provide predictions in unseen data. We evaluate the performance of the proposed
model on real RGB-D data collected from public scene datasets. Experimental
results showed that the proposed model performs well in terms of accuracy and
speed, while showcasing robustness to variation in the natural language input.
Related papers
- Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models [53.22792173053473]
We introduce an interactive robotic manipulation framework called Polaris.
Polaris integrates perception and interaction by utilizing GPT-4 alongside grounded vision models.
We propose a novel Synthetic-to-Real (Syn2Real) pose estimation pipeline.
arXiv Detail & Related papers (2024-08-15T06:40:38Z) - Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation [16.809190349155525]
Recent works have turned to large-scale pre-training using human data.
morphological differences between humans and robots introduce a significant human-robot domain discrepancy.
We propose a novel adaptation paradigm that utilizes readily available paired human-robot video data to bridge the discrepancy.
arXiv Detail & Related papers (2024-06-20T11:57:46Z) - PACT: Perception-Action Causal Transformer for Autoregressive Robotics
Pre-Training [25.50131893785007]
This work introduces a paradigm for pre-training a general purpose representation that can serve as a starting point for multiple tasks on a given robot.
We present the Perception-Action Causal Transformer (PACT), a generative transformer-based architecture that aims to build representations directly from robot data in a self-supervised fashion.
We show that finetuning small task-specific networks on top of the larger pretrained model results in significantly better performance compared to training a single model from scratch for all tasks simultaneously.
arXiv Detail & Related papers (2022-09-22T16:20:17Z) - Model Predictive Control for Fluid Human-to-Robot Handovers [50.72520769938633]
Planning motions that take human comfort into account is not a part of the human-robot handover process.
We propose to generate smooth motions via an efficient model-predictive control framework.
We conduct human-to-robot handover experiments on a diverse set of objects with several users.
arXiv Detail & Related papers (2022-03-31T23:08:20Z) - Open-VICO: An Open-Source Gazebo Toolkit for Multi-Camera-based Skeleton
Tracking in Human-Robot Collaboration [0.0]
This work presents Open-VICO, an open-source toolkit to integrate virtual human models in Gazebo.
In particular, Open-VICO allows to combine in the same simulation environment realistic human kinematic models, multi-camera vision setups, and human-tracking techniques.
arXiv Detail & Related papers (2022-03-28T13:21:32Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human
Videos [59.58105314783289]
Domain-agnostic Video Discriminator (DVD) learns multitask reward functions by training a discriminator to classify whether two videos are performing the same task.
DVD can generalize by virtue of learning from a small amount of robot data with a broad dataset of human videos.
DVD can be combined with visual model predictive control to solve robotic manipulation tasks on a real WidowX200 robot in an unseen environment from a single human demo.
arXiv Detail & Related papers (2021-03-31T05:25:05Z) - Where is my hand? Deep hand segmentation for visual self-recognition in
humanoid robots [129.46920552019247]
We propose the use of a Convolution Neural Network (CNN) to segment the robot hand from an image in an egocentric view.
We fine-tuned the Mask-RCNN network for the specific task of segmenting the hand of the humanoid robot Vizzy.
arXiv Detail & Related papers (2021-02-09T10:34:32Z) - Learning Predictive Models From Observation and Interaction [137.77887825854768]
Learning predictive models from interaction with the world allows an agent, such as a robot, to learn about how the world works.
However, learning a model that captures the dynamics of complex skills represents a major challenge.
We propose a method to augment the training set with observational data of other agents, such as humans.
arXiv Detail & Related papers (2019-12-30T01:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.