Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future
Directions
- URL: http://arxiv.org/abs/2203.12667v1
- Date: Tue, 22 Mar 2022 16:58:10 GMT
- Title: Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future
Directions
- Authors: Jing Gu, Eliana Stefani, Qi Wu, Jesse Thomason, Xin Eric Wang
- Abstract summary: Vision-and-Language Navigation (VLN) is a fundamental and interdisciplinary research topic towards this goal.
VLN receives increasing attention from natural language processing, computer vision, robotics, and machine learning communities.
This paper serves as a thorough reference for the VLN research community.
- Score: 23.389491536958772
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A long-term goal of AI research is to build intelligent agents that can
communicate with humans in natural language, perceive the environment, and
perform real-world tasks. Vision-and-Language Navigation (VLN) is a fundamental
and interdisciplinary research topic towards this goal, and receives increasing
attention from natural language processing, computer vision, robotics, and
machine learning communities. In this paper, we review contemporary studies in
the emerging field of VLN, covering tasks, evaluation metrics, methods, etc.
Through structured analysis of current progress and challenges, we highlight
the limitations of current VLN and opportunities for future work. This paper
serves as a thorough reference for the VLN research community.
Related papers
- Recent Advances in Generative AI and Large Language Models: Current Status, Challenges, and Perspectives [10.16399860867284]
The emergence of Generative Artificial Intelligence (AI) and Large Language Models (LLMs) has marked a new era of Natural Language Processing (NLP)
This paper explores the current state of these cutting-edge technologies, demonstrating their remarkable advancements and wide-ranging applications.
arXiv Detail & Related papers (2024-07-20T18:48:35Z) - Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models [79.04590934264235]
Vision-and-Language Navigation (VLN) has gained increasing attention over recent years.
Foundation models have shaped the challenges and proposed methods for VLN research.
arXiv Detail & Related papers (2024-07-09T16:53:36Z) - Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions [69.9980759344628]
Vision-and-Language Navigation (VLN) aims to develop embodied agents that navigate based on human instructions.
We introduce Human-Aware Vision-and-Language Navigation (HA-VLN), extending traditional VLN by incorporating dynamic human activities.
We present the Expert-Supervised Cross-Modal (VLN-CM) and Non-Expert-Supervised Decision Transformer (VLN-DT) agents, utilizing cross-modal fusion and diverse training strategies.
arXiv Detail & Related papers (2024-06-27T15:01:42Z) - Large Language Models for Education: A Survey and Outlook [69.02214694865229]
We systematically review the technological advancements in each perspective, organize related datasets and benchmarks, and identify the risks and challenges associated with deploying LLMs in education.
Our survey aims to provide a comprehensive technological picture for educators, researchers, and policymakers to harness the power of LLMs to revolutionize educational practices and foster a more effective personalized learning environment.
arXiv Detail & Related papers (2024-03-26T21:04:29Z) - Vision-Language Navigation with Embodied Intelligence: A Survey [19.049590467248255]
Vision-language navigation (VLN) is a critical research path to achieve embodied intelligence.
VLN integrates artificial intelligence, natural language processing, computer vision, and robotics.
This survey systematically reviews the research progress of VLN and details the research direction of VLN with embodied intelligence.
arXiv Detail & Related papers (2024-02-22T05:45:17Z) - Recent Advances in Hate Speech Moderation: Multimodality and the Role of Large Models [52.24001776263608]
This comprehensive survey delves into the recent strides in HS moderation.
We highlight the burgeoning role of large language models (LLMs) and large multimodal models (LMMs)
We identify existing gaps in research, particularly in the context of underrepresented languages and cultures.
arXiv Detail & Related papers (2024-01-30T03:51:44Z) - Interactive Natural Language Processing [67.87925315773924]
Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within the field of NLP.
This paper offers a comprehensive survey of iNLP, starting by proposing a unified definition and framework of the concept.
arXiv Detail & Related papers (2023-05-22T17:18:29Z) - Core Challenges in Embodied Vision-Language Planning [11.896110519868545]
Embodied Vision-Language Planning tasks leverage computer vision and natural language for interaction in physical environments.
We propose a taxonomy to unify these tasks and provide an analysis and comparison of the current and new algorithmic approaches.
We advocate for task construction that enables model generalisability and furthers real-world deployment.
arXiv Detail & Related papers (2023-04-05T20:37:13Z) - VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and
Challenges [1.565870461096057]
The integration of vision and language has sparked a lot of attention as a result of this.
The tasks have been created in such a way that they properly exemplify the concepts of deep learning.
arXiv Detail & Related papers (2022-12-26T20:56:01Z) - Core Challenges in Embodied Vision-Language Planning [9.190245973578698]
We discuss Embodied Vision-Language Planning tasks, a family of prominent embodied navigation and manipulation problems.
We propose a taxonomy to unify these tasks and provide an analysis and comparison of the new and current algorithmic approaches.
We advocate for task construction that enables model generalizability and furthers real-world deployment.
arXiv Detail & Related papers (2021-06-26T05:18:58Z) - Positioning yourself in the maze of Neural Text Generation: A
Task-Agnostic Survey [54.34370423151014]
This paper surveys the components of modeling approaches relaying task impacts across various generation tasks such as storytelling, summarization, translation etc.
We present an abstraction of the imperative techniques with respect to learning paradigms, pretraining, modeling approaches, decoding and the key challenges outstanding in the field in each of them.
arXiv Detail & Related papers (2020-10-14T17:54:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.