Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models
- URL: http://arxiv.org/abs/2407.07035v1
- Date: Tue, 9 Jul 2024 16:53:36 GMT
- Title: Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models
- Authors: Yue Zhang, Ziqiao Ma, Jialu Li, Yanyuan Qiao, Zun Wang, Joyce Chai, Qi Wu, Mohit Bansal, Parisa Kordjamshidi,
- Abstract summary: Vision-and-Language Navigation (VLN) has gained increasing attention over recent years.
Foundation models have shaped the challenges and proposed methods for VLN research.
- Score: 79.04590934264235
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision-and-Language Navigation (VLN) has gained increasing attention over recent years and many approaches have emerged to advance their development. The remarkable achievements of foundation models have shaped the challenges and proposed methods for VLN research. In this survey, we provide a top-down review that adopts a principled framework for embodied planning and reasoning, and emphasizes the current methods and future opportunities leveraging foundation models to address VLN challenges. We hope our in-depth discussions could provide valuable resources and insights: on one hand, to milestone the progress and explore opportunities and potential roles for foundation models in this field, and on the other, to organize different challenges and solutions in VLN to foundation model researchers.
Related papers
- A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications [52.42860559005861]
Direct Preference Optimization (DPO) has emerged as a promising approach for alignment.
Despite DPO's various advancements and inherent limitations, an in-depth review of these aspects is currently lacking in the literature.
arXiv Detail & Related papers (2024-10-21T02:27:24Z) - Fine-Grained Zero-Shot Learning: Advances, Challenges, and Prospects [84.36935309169567]
We present a broad review of recent advances for fine-grained analysis in zero-shot learning (ZSL)
We first provide a taxonomy of existing methods and techniques with a thorough analysis of each category.
Then, we summarize the benchmark, covering publicly available datasets, models, implementations, and some more details as a library.
arXiv Detail & Related papers (2024-01-31T11:51:24Z) - A Survey on 3D Skeleton Based Person Re-Identification: Approaches,
Designs, Challenges, and Future Directions [71.99165135905827]
Person re-identification via 3D skeletons is an important emerging research area that triggers great interest in the pattern recognition community.
With distinctive advantages for many application scenarios, a great diversity of 3D skeleton based person re-identification methods have been proposed in recent years.
This paper provides a systematic survey on current SRID approaches, model designs, challenges, and future directions.
arXiv Detail & Related papers (2024-01-27T04:52:24Z) - A Survey of Reasoning with Foundation Models [235.7288855108172]
Reasoning plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation.
We introduce seminal foundation models proposed or adaptable for reasoning.
We then delve into the potential future directions behind the emergence of reasoning abilities within foundation models.
arXiv Detail & Related papers (2023-12-17T15:16:13Z) - Foundation Models Meet Visualizations: Challenges and Opportunities [23.01218856618978]
This paper divides visualizations for foundation models (VIS4FM) and foundation models for visualizations (FM4VIS)
In VIS4FM, we explore the primary role of visualizations in understanding, refining, and evaluating these intricate models.
In FM4VIS, we highlight how foundation models can be utilized to advance the visualization field itself.
arXiv Detail & Related papers (2023-10-09T14:57:05Z) - Survey of Social Bias in Vision-Language Models [65.44579542312489]
Survey aims to provide researchers with a high-level insight into the similarities and differences of social bias studies in pre-trained models across NLP, CV, and VL.
The findings and recommendations presented here can benefit the ML community, fostering the development of fairer and non-biased AI models.
arXiv Detail & Related papers (2023-09-24T15:34:56Z) - Towards Reasoning in Large Language Models: A Survey [11.35055307348939]
It is not yet clear to what extent large language models (LLMs) are capable of reasoning.
This paper provides a comprehensive overview of the current state of knowledge on reasoning in LLMs.
arXiv Detail & Related papers (2022-12-20T16:29:03Z) - Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future
Directions [23.389491536958772]
Vision-and-Language Navigation (VLN) is a fundamental and interdisciplinary research topic towards this goal.
VLN receives increasing attention from natural language processing, computer vision, robotics, and machine learning communities.
This paper serves as a thorough reference for the VLN research community.
arXiv Detail & Related papers (2022-03-22T16:58:10Z) - Multimodal Research in Vision and Language: A Review of Current and
Emerging Trends [41.07256031348454]
We present a detailed overview of the latest trends in research pertaining to visual and language modalities.
We look at its applications in their task formulations and how to solve various problems related to semantic perception and content generation.
We shed some light on multi-disciplinary patterns and insights that have emerged in the recent past, directing this field towards more modular and transparent intelligent systems.
arXiv Detail & Related papers (2020-10-19T13:55:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.