Perspective-corrected Spatial Referring Expression Generation for
Human-Robot Interaction
- URL: http://arxiv.org/abs/2104.01558v1
- Date: Sun, 4 Apr 2021 08:00:02 GMT
- Title: Perspective-corrected Spatial Referring Expression Generation for
Human-Robot Interaction
- Authors: Mingjiang Liu, Chengli Xiao, Chunlin Chen
- Abstract summary: We propose a novel perspective-corrected spatial referring expression generation (PcSREG) approach for human-robot interaction.
The task of referring expression generation is simplified into the process of generating diverse spatial relation units.
We implement the proposed approach on a robot system and empirical experiments show that our approach can generate more effective spatial referring expressions.
- Score: 5.0726912337429795
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Intelligent robots designed to interact with humans in real scenarios need to
be able to refer to entities actively by natural language. In spatial referring
expression generation, the ambiguity is unavoidable due to the diversity of
reference frames, which will lead to an understanding gap between humans and
robots. To narrow this gap, in this paper, we propose a novel
perspective-corrected spatial referring expression generation (PcSREG) approach
for human-robot interaction by considering the selection of reference frames.
The task of referring expression generation is simplified into the process of
generating diverse spatial relation units. First, we pick out all landmarks in
these spatial relation units according to the entropy of preference and allow
its updating through a stack model. Then all possible referring expressions are
generated according to different reference frame strategies. Finally, we
evaluate every expression using a probabilistic referring expression resolution
model and find the best expression that satisfies both of the appropriateness
and effectiveness. We implement the proposed approach on a robot system and
empirical experiments show that our approach can generate more effective
spatial referring expressions for practical applications.
Related papers
- Multi-Agent Dynamic Relational Reasoning for Social Robot Navigation [55.65482030032804]
Social robot navigation can be helpful in various contexts of daily life but requires safe human-robot interactions and efficient trajectory planning.
We propose a systematic relational reasoning approach with explicit inference of the underlying dynamically evolving relational structures.
Our approach infers dynamically evolving relation graphs and hypergraphs to capture the evolution of relations, which the trajectory predictor employs to generate future states.
arXiv Detail & Related papers (2024-01-22T18:58:22Z) - Probabilistic Transformer: A Probabilistic Dependency Model for
Contextual Word Representation [52.270712965271656]
We propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective.
We find that the graph of our model resembles transformers, with correspondences between dependencies and self-attention.
Experiments show that our model performs competitively to transformers on small to medium sized datasets.
arXiv Detail & Related papers (2023-11-26T06:56:02Z) - Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.
Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.
Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z) - Visual Affordance Prediction for Guiding Robot Exploration [56.17795036091848]
We develop an approach for learning visual affordances for guiding robot exploration.
We use a Transformer-based model to learn a conditional distribution in the latent embedding space of a VQ-VAE.
We show how the trained affordance model can be used for guiding exploration by acting as a goal-sampling distribution, during visual goal-conditioned policy learning in robotic manipulation.
arXiv Detail & Related papers (2023-05-28T17:53:09Z) - Data-driven emotional body language generation for social robotics [58.88028813371423]
In social robotics, endowing humanoid robots with the ability to generate bodily expressions of affect can improve human-robot interaction and collaboration.
We implement a deep learning data-driven framework that learns from a few hand-designed robotic bodily expressions.
The evaluation study found that the anthropomorphism and animacy of the generated expressions are not perceived differently from the hand-designed ones.
arXiv Detail & Related papers (2022-05-02T09:21:39Z) - HARPS: An Online POMDP Framework for Human-Assisted Robotic Planning and
Sensing [1.3678064890824186]
The Human Assisted Robotic Planning and Sensing (HARPS) framework is presented for active semantic sensing and planning in human-robot teams.
This approach lets humans opportunistically impose model structure and extend the range of semantic soft data in uncertain environments.
Simulations of a UAV-enabled target search application in a large-scale partially structured environment show significant improvements in time and belief state estimates.
arXiv Detail & Related papers (2021-10-20T00:41:57Z) - Unsupervised Lexical Acquisition of Relative Spatial Concepts Using
Spoken User Utterances [0.0]
A robot with a flexible spoken dialog system must be able to acquire linguistic representation.
Relative spatial concepts are widely used in our daily lives.
It is not obvious which object is a reference object when a robot learns relative spatial concepts.
arXiv Detail & Related papers (2021-06-16T06:44:27Z) - Language Understanding for Field and Service Robots in a Priori Unknown
Environments [29.16936249846063]
This paper provides a novel learning framework that allows field and service robots to interpret and execute natural language instructions.
We use language as a "sensor" -- inferring spatial, topological, and semantic information implicit in natural language utterances.
We incorporate this distribution in a probabilistic language grounding model and infer a distribution over a symbolic representation of the robot's action space.
arXiv Detail & Related papers (2021-05-21T15:13:05Z) - Spatial Language Understanding for Object Search in Partially Observed
Cityscale Environments [21.528770932332474]
We introduce the spatial language observation space and formulate a model under the framework of Partially Observable Markov Decision Process (POMDP)
We propose a convolutional neural network model that learns to predict the language provider's relative frame of reference (FoR) given environment context.
We demonstrate the generalizability of our FoR prediction model and object search system through cross-validation over areas of five cities, each with a 40,000m$2$ footprint.
arXiv Detail & Related papers (2020-12-04T16:27:59Z) - Risk-Sensitive Sequential Action Control with Multi-Modal Human
Trajectory Forecasting for Safe Crowd-Robot Interaction [55.569050872780224]
We present an online framework for safe crowd-robot interaction based on risk-sensitive optimal control, wherein the risk is modeled by the entropic risk measure.
Our modular approach decouples the crowd-robot interaction into learning-based prediction and model-based control.
A simulation study and a real-world experiment show that the proposed framework can accomplish safe and efficient navigation while avoiding collisions with more than 50 humans in the scene.
arXiv Detail & Related papers (2020-09-12T02:02:52Z) - Toward Forgetting-Sensitive Referring Expression Generationfor
Integrated Robot Architectures [1.8456386856206592]
We show how different models of working memory forgetting may be differentially effective at producing natural human-like referring expressions.
In this work, we computationalize two candidate models of working memory forgetting within a robot cognitive architecture, and demonstrate how they lead to cognitive availability-based differences in generated referring expressions.
arXiv Detail & Related papers (2020-07-16T22:20:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.