Vision-Language Navigation with Embodied Intelligence: A Survey
- URL: http://arxiv.org/abs/2402.14304v1
- Date: Thu, 22 Feb 2024 05:45:17 GMT
- Title: Vision-Language Navigation with Embodied Intelligence: A Survey
- Authors: Peng Gao, Peng Wang, Feng Gao, Fei Wang, Ruyue Yuan
- Abstract summary: Vision-language navigation (VLN) is a critical research path to achieve embodied intelligence.
VLN integrates artificial intelligence, natural language processing, computer vision, and robotics.
This survey systematically reviews the research progress of VLN and details the research direction of VLN with embodied intelligence.
- Score: 19.049590467248255
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: As a long-term vision in the field of artificial intelligence, the core goal
of embodied intelligence is to improve the perception, understanding, and
interaction capabilities of agents and the environment. Vision-language
navigation (VLN), as a critical research path to achieve embodied intelligence,
focuses on exploring how agents use natural language to communicate effectively
with humans, receive and understand instructions, and ultimately rely on visual
information to achieve accurate navigation. VLN integrates artificial
intelligence, natural language processing, computer vision, and robotics. This
field faces technical challenges but shows potential for application such as
human-computer interaction. However, due to the complex process involved from
language understanding to action execution, VLN faces the problem of aligning
visual information and language instructions, improving generalization ability,
and many other challenges. This survey systematically reviews the research
progress of VLN and details the research direction of VLN with embodied
intelligence. After a detailed summary of its system architecture and research
based on methods and commonly used benchmark datasets, we comprehensively
analyze the problems and challenges faced by current research and explore the
future development direction of this field, aiming to provide a practical
reference for researchers.
Related papers
- Analyzing the Roles of Language and Vision in Learning from Limited Data [31.895396236504993]
We study the contributions that language and vision make to learning about the world.
We find that a language model leveraging all components recovers a majority of a Vision-Language Model's performance.
arXiv Detail & Related papers (2024-02-15T22:19:41Z) - Large Language Models for Information Retrieval: A Survey [58.30439850203101]
Information retrieval has evolved from term-based methods to its integration with advanced neural models.
Recent research has sought to leverage large language models (LLMs) to improve IR systems.
We delve into the confluence of LLMs and IR systems, including crucial aspects such as query rewriters, retrievers, rerankers, and readers.
arXiv Detail & Related papers (2023-08-14T12:47:22Z) - Towards AGI in Computer Vision: Lessons Learned from GPT and Large
Language Models [98.72986679502871]
Chat systems powered by large language models (LLMs) emerge and rapidly become a promising direction to achieve artificial general intelligence (AGI)
But the path towards AGI in computer vision (CV) remains unclear.
We imagine a pipeline that puts a CV algorithm in world-scale, interactable environments, pre-trains it to predict future frames with respect to its action, and then fine-tunes it with instruction to accomplish various tasks.
arXiv Detail & Related papers (2023-06-14T17:15:01Z) - Interactive Natural Language Processing [67.87925315773924]
Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within the field of NLP.
This paper offers a comprehensive survey of iNLP, starting by proposing a unified definition and framework of the concept.
arXiv Detail & Related papers (2023-05-22T17:18:29Z) - Vision-Language Models in Remote Sensing: Current Progress and Future Trends [25.017685538386548]
Vision-language models enable reasoning about images and their associated textual descriptions, allowing for a deeper understanding of the underlying semantics.
Vision-language models can go beyond visual recognition of RS images, model semantic relationships, and generate natural language descriptions of the image.
This paper provides a comprehensive review of the research on vision-language models in remote sensing.
arXiv Detail & Related papers (2023-05-09T19:17:07Z) - Core Challenges in Embodied Vision-Language Planning [11.896110519868545]
Embodied Vision-Language Planning tasks leverage computer vision and natural language for interaction in physical environments.
We propose a taxonomy to unify these tasks and provide an analysis and comparison of the current and new algorithmic approaches.
We advocate for task construction that enables model generalisability and furthers real-world deployment.
arXiv Detail & Related papers (2023-04-05T20:37:13Z) - Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future
Directions [23.389491536958772]
Vision-and-Language Navigation (VLN) is a fundamental and interdisciplinary research topic towards this goal.
VLN receives increasing attention from natural language processing, computer vision, robotics, and machine learning communities.
This paper serves as a thorough reference for the VLN research community.
arXiv Detail & Related papers (2022-03-22T16:58:10Z) - Deep Learning for Embodied Vision Navigation: A Survey [108.13766213265069]
"Embodied visual navigation" problem requires an agent to navigate in a 3D environment mainly rely on its first-person observation.
This paper attempts to establish an outline of the current works in the field of embodied visual navigation by providing a comprehensive literature survey.
arXiv Detail & Related papers (2021-07-07T12:09:04Z) - Core Challenges in Embodied Vision-Language Planning [9.190245973578698]
We discuss Embodied Vision-Language Planning tasks, a family of prominent embodied navigation and manipulation problems.
We propose a taxonomy to unify these tasks and provide an analysis and comparison of the new and current algorithmic approaches.
We advocate for task construction that enables model generalizability and furthers real-world deployment.
arXiv Detail & Related papers (2021-06-26T05:18:58Z) - Empowering Things with Intelligence: A Survey of the Progress,
Challenges, and Opportunities in Artificial Intelligence of Things [98.10037444792444]
We show how AI can empower the IoT to make it faster, smarter, greener, and safer.
First, we present progress in AI research for IoT from four perspectives: perceiving, learning, reasoning, and behaving.
Finally, we summarize some promising applications of AIoT that are likely to profoundly reshape our world.
arXiv Detail & Related papers (2020-11-17T13:14:28Z) - A Review on Intelligent Object Perception Methods Combining
Knowledge-based Reasoning and Machine Learning [60.335974351919816]
Object perception is a fundamental sub-field of Computer Vision.
Recent works seek ways to integrate knowledge engineering in order to expand the level of intelligence of the visual interpretation of objects.
arXiv Detail & Related papers (2019-12-26T13:26:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.