Unsupervised Lexical Acquisition of Relative Spatial Concepts Using
Spoken User Utterances
- URL: http://arxiv.org/abs/2106.08574v1
- Date: Wed, 16 Jun 2021 06:44:27 GMT
- Title: Unsupervised Lexical Acquisition of Relative Spatial Concepts Using
Spoken User Utterances
- Authors: Rikunari Sagara (1), Ryo Taguchi (1), Akira Taniguchi (2), Tadahiro
Taniguchi (2), Koosuke Hattori (3), Masahiro Hoguro (3), Taizo Umezaki (3)
((1) Nagoya Institute of Technology, (2) Ritsumeikan University, (3) Chubu
University)
- Abstract summary: A robot with a flexible spoken dialog system must be able to acquire linguistic representation.
Relative spatial concepts are widely used in our daily lives.
It is not obvious which object is a reference object when a robot learns relative spatial concepts.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes methods for unsupervised lexical acquisition for relative
spatial concepts using spoken user utterances. A robot with a flexible spoken
dialog system must be able to acquire linguistic representation and its meaning
specific to an environment through interactions with humans as children do.
Specifically, relative spatial concepts (e.g., front and right) are widely used
in our daily lives, however, it is not obvious which object is a reference
object when a robot learns relative spatial concepts. Therefore, we propose
methods by which a robot without prior knowledge of words can learn relative
spatial concepts. The methods are formulated using a probabilistic model to
estimate the proper reference objects and distributions representing concepts
simultaneously. The experimental results show that relative spatial concepts
and a phoneme sequence representing each concept can be learned under the
condition that the robot does not know which located object is the reference
object. Additionally, we show that two processes in the proposed method improve
the estimation accuracy of the concepts: generating candidate word sequences by
class n-gram and selecting word sequences using location information.
Furthermore, we show that clues to reference objects improve accuracy even
though the number of candidate reference objects increases.
Related papers
- Visual Affordance Prediction for Guiding Robot Exploration [56.17795036091848]
We develop an approach for learning visual affordances for guiding robot exploration.
We use a Transformer-based model to learn a conditional distribution in the latent embedding space of a VQ-VAE.
We show how the trained affordance model can be used for guiding exploration by acting as a goal-sampling distribution, during visual goal-conditioned policy learning in robotic manipulation.
arXiv Detail & Related papers (2023-05-28T17:53:09Z) - ConceptBeam: Concept Driven Target Speech Extraction [69.85003619274295]
We propose a novel framework for target speech extraction based on semantic information, called ConceptBeam.
In our scheme, a concept is encoded as a semantic embedding by mapping the concept specifier to a shared embedding space.
We use it to bridge modality-dependent information, i.e., the speech segments in the mixture, and the specified, modality-independent concept.
arXiv Detail & Related papers (2022-07-25T08:06:07Z) - Discovering Concepts in Learned Representations using Statistical
Inference and Interactive Visualization [0.76146285961466]
Concept discovery is important for bridging the gap between non-deep learning experts and model end-users.
Current approaches include hand-crafting concept datasets and then converting them to latent space directions.
In this study, we offer another two approaches to guide user discovery of meaningful concepts, one based on multiple hypothesis testing, and another on interactive visualization.
arXiv Detail & Related papers (2022-02-09T22:29:48Z) - LanguageRefer: Spatial-Language Model for 3D Visual Grounding [72.7618059299306]
We develop a spatial-language model for a 3D visual grounding problem.
We show that our model performs competitively on visio-linguistic datasets proposed by ReferIt3D.
arXiv Detail & Related papers (2021-07-07T18:55:03Z) - Understanding Synonymous Referring Expressions via Contrastive Features [105.36814858748285]
We develop an end-to-end trainable framework to learn contrastive features on the image and object instance levels.
We conduct extensive experiments to evaluate the proposed algorithm on several benchmark datasets.
arXiv Detail & Related papers (2021-04-20T17:56:24Z) - Perspective-corrected Spatial Referring Expression Generation for
Human-Robot Interaction [5.0726912337429795]
We propose a novel perspective-corrected spatial referring expression generation (PcSREG) approach for human-robot interaction.
The task of referring expression generation is simplified into the process of generating diverse spatial relation units.
We implement the proposed approach on a robot system and empirical experiments show that our approach can generate more effective spatial referring expressions.
arXiv Detail & Related papers (2021-04-04T08:00:02Z) - Spatial Language Understanding for Object Search in Partially Observed
Cityscale Environments [21.528770932332474]
We introduce the spatial language observation space and formulate a model under the framework of Partially Observable Markov Decision Process (POMDP)
We propose a convolutional neural network model that learns to predict the language provider's relative frame of reference (FoR) given environment context.
We demonstrate the generalizability of our FoR prediction model and object search system through cross-validation over areas of five cities, each with a 40,000m$2$ footprint.
arXiv Detail & Related papers (2020-12-04T16:27:59Z) - SIRI: Spatial Relation Induced Network For Spatial Description
Resolution [64.38872296406211]
We propose a novel relationship induced (SIRI) network for language-guided localization.
We show that our method is around 24% better than the state-of-the-art method in terms of accuracy, measured by an 80-pixel radius.
Our method also generalizes well on our proposed extended dataset collected using the same settings as Touchdown.
arXiv Detail & Related papers (2020-10-27T14:04:05Z) - Understanding Spatial Relations through Multiple Modalities [78.07328342973611]
spatial relations between objects can either be explicit -- expressed as spatial prepositions, or implicit -- expressed by spatial verbs such as moving, walking, shifting, etc.
We introduce the task of inferring implicit and explicit spatial relations between two entities in an image.
We design a model that uses both textual and visual information to predict the spatial relations, making use of both positional and size information of objects and image embeddings.
arXiv Detail & Related papers (2020-07-19T01:35:08Z) - Spatial Concept-Based Navigation with Human Speech Instructions via
Probabilistic Inference on Bayesian Generative Model [8.851071399120542]
The aim of this study is to enable a mobile robot to perform navigational tasks with human speech instructions.
Path planning was formalized as the spatial probabilistic distribution on the path-trajectory under speech instruction.
We demonstrated path planning based on human instruction using acquired spatial concepts to verify the usefulness of the proposed approach in the simulator and in real environments.
arXiv Detail & Related papers (2020-02-18T05:35:29Z) - Learning Object Placements For Relational Instructions by Hallucinating
Scene Representations [26.897316325189205]
We present a convolutional neural network for estimating pixelwise object placement probabilities for a set of spatial relations from a single input image.
Our method does not require ground truth data for the pixelwise relational probabilities or 3D models of the objects.
Results obtained using real-world data and human-robot experiments demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2020-01-23T12:58:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.