Embodied Agents for Efficient Exploration and Smart Scene Description
- URL: http://arxiv.org/abs/2301.07150v1
- Date: Tue, 17 Jan 2023 19:28:01 GMT
- Title: Embodied Agents for Efficient Exploration and Smart Scene Description
- Authors: Roberto Bigazzi, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi,
Rita Cucchiara
- Abstract summary: We tackle a setting for visual navigation in which an autonomous agent needs to explore and map an unseen indoor environment.
We propose and evaluate an approach that combines recent advances in visual robotic exploration and image captioning on images.
Our approach can generate smart scene descriptions that maximize semantic knowledge of the environment and avoid repetitions.
- Score: 47.82947878753809
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The development of embodied agents that can communicate with humans in
natural language has gained increasing interest over the last years, as it
facilitates the diffusion of robotic platforms in human-populated environments.
As a step towards this objective, in this work, we tackle a setting for visual
navigation in which an autonomous agent needs to explore and map an unseen
indoor environment while portraying interesting scenes with natural language
descriptions. To this end, we propose and evaluate an approach that combines
recent advances in visual robotic exploration and image captioning on images
generated through agent-environment interaction. Our approach can generate
smart scene descriptions that maximize semantic knowledge of the environment
and avoid repetitions. Further, such descriptions offer user-understandable
insights into the robot's representation of the environment by highlighting the
prominent objects and the correlation between them as encountered during the
exploration. To quantitatively assess the performance of the proposed approach,
we also devise a specific score that takes into account both exploration and
description skills. The experiments carried out on both photorealistic
simulated environments and real-world ones demonstrate that our approach can
effectively describe the robot's point of view during exploration, improving
the human-friendly interpretability of its observations.
Related papers
- Embodied Instruction Following in Unknown Environments [66.60163202450954]
We propose an embodied instruction following (EIF) method for complex tasks in the unknown environment.
We build a hierarchical embodied instruction following framework including the high-level task planner and the low-level exploration controller.
For the task planner, we generate the feasible step-by-step plans for human goal accomplishment according to the task completion process and the known visual clues.
arXiv Detail & Related papers (2024-06-17T17:55:40Z) - Self-Explainable Affordance Learning with Embodied Caption [63.88435741872204]
We introduce Self-Explainable Affordance learning (SEA) with embodied caption.
SEA enables robots to articulate their intentions and bridge the gap between explainable vision-language caption and visual affordance learning.
We propose a novel model to effectively combine affordance grounding with self-explanation in a simple but efficient manner.
arXiv Detail & Related papers (2024-04-08T15:22:38Z) - Agent AI: Surveying the Horizons of Multimodal Interaction [83.18367129924997]
"Agent AI" is a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data.
We envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment.
arXiv Detail & Related papers (2024-01-07T19:11:18Z) - Proactive Human-Robot Interaction using Visuo-Lingual Transformers [0.0]
Humans possess the innate ability to extract latent visuo-lingual cues to infer context through human interaction.
We propose a learning-based method that uses visual cues from the scene, lingual commands from a user and knowledge of prior object-object interaction to identify and proactively predict the underlying goal the user intends to achieve.
arXiv Detail & Related papers (2023-10-04T00:50:21Z) - Robot Active Neural Sensing and Planning in Unknown Cluttered
Environments [0.0]
Active sensing and planning in unknown, cluttered environments is an open challenge for robots intending to provide home service, search and rescue, narrow-passage inspection, and medical assistance.
We present the active neural sensing approach that generates the kinematically feasible viewpoint sequences for the robot manipulator with an in-hand camera to gather the minimum number of observations needed to reconstruct the underlying environment.
Our framework actively collects the visual RGBD observations, aggregates them into scene representation, and performs object shape inference to avoid unnecessary robot interactions with the environment.
arXiv Detail & Related papers (2022-08-23T16:56:54Z) - Information is Power: Intrinsic Control via Information Capture [110.3143711650806]
We argue that a compact and general learning objective is to minimize the entropy of the agent's state visitation estimated using a latent state-space model.
This objective induces an agent to both gather information about its environment, corresponding to reducing uncertainty, and to gain control over its environment, corresponding to reducing the unpredictability of future world states.
arXiv Detail & Related papers (2021-12-07T18:50:42Z) - Spatial Imagination With Semantic Cognition for Mobile Robots [1.933681537640272]
This paper provides a training-based algorithm for mobile robots to perform spatial imagination based on semantic cognition.
We utilize a photo-realistic simulation environment, Habitat, for training and evaluation.
It is found that our approach will improve the efficiency and accuracy of semantic mapping.
arXiv Detail & Related papers (2021-04-08T09:44:49Z) - Embodied Visual Active Learning for Semantic Segmentation [33.02424587900808]
We study the task of embodied visual active learning, where an agent is set to explore a 3d environment with the goal to acquire visual scene understanding.
We develop a battery of agents - both learnt and pre-specified - and with different levels of knowledge of the environment.
We extensively evaluate the proposed models using the Matterport3D simulator and show that a fully learnt method outperforms comparable pre-specified counterparts.
arXiv Detail & Related papers (2020-12-17T11:02:34Z) - Towards Embodied Scene Description [36.17224570332247]
Embodiment is an important characteristic for all intelligent agents (creatures and robots)
We propose the Embodied Scene Description, which exploits the embodiment ability of the agent to find an optimal viewpoint in its environment for scene description tasks.
A learning framework with the paradigms of imitation learning and reinforcement learning is established to teach the intelligent agent to generate corresponding sensorimotor activities.
arXiv Detail & Related papers (2020-04-30T08:50:25Z) - SAPIEN: A SimulAted Part-based Interactive ENvironment [77.4739790629284]
SAPIEN is a realistic and physics-rich simulated environment that hosts a large-scale set for articulated objects.
We evaluate state-of-the-art vision algorithms for part detection and motion attribute recognition as well as demonstrate robotic interaction tasks.
arXiv Detail & Related papers (2020-03-19T00:11:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.