Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models
- URL: http://arxiv.org/abs/2407.07035v2
- Date: Sun, 29 Dec 2024 23:16:37 GMT
- Title: Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models
- Authors: Yue Zhang, Ziqiao Ma, Jialu Li, Yanyuan Qiao, Zun Wang, Joyce Chai, Qi Wu, Mohit Bansal, Parisa Kordjamshidi,
- Abstract summary: Vision-and-Language Navigation (VLN) has gained increasing attention over recent years.
Foundation models have shaped the challenges and proposed methods for VLN research.
- Score: 79.04590934264235
- License:
- Abstract: Vision-and-Language Navigation (VLN) has gained increasing attention over recent years and many approaches have emerged to advance their development. The remarkable achievements of foundation models have shaped the challenges and proposed methods for VLN research. In this survey, we provide a top-down review that adopts a principled framework for embodied planning and reasoning, and emphasizes the current methods and future opportunities leveraging foundation models to address VLN challenges. We hope our in-depth discussions could provide valuable resources and insights: on one hand, to milestone the progress and explore opportunities and potential roles for foundation models in this field, and on the other, to organize different challenges and solutions in VLN to foundation model researchers.
Related papers
- Spatio-Temporal Foundation Models: Vision, Challenges, and Opportunities [48.45951497996322]
Foundation models (STFMs) have revolutionized artificial intelligence, setting new benchmarks in performance and enabling transformative capabilities across a wide range of vision and language tasks.
In this paper, we articulate a vision for the future of STFMs, outlining their essential characteristics and generalization capabilities necessary for broad applicability.
We explore potential opportunities and directions to advance research towards the aim of effective and broadly applicable STFMs.
arXiv Detail & Related papers (2025-01-15T08:52:28Z) - How to Enable Effective Cooperation Between Humans and NLP Models: A Survey of Principles, Formalizations, and Beyond [73.5546464126465]
We present a thorough review of human-model cooperation, exploring its principles, formalizations, and open challenges.
We introduce a new taxonomy that provides a unified perspective to summarize existing approaches.
Also, we discuss potential frontier areas and their corresponding challenges.
arXiv Detail & Related papers (2025-01-10T05:15:14Z) - How Vision-Language Tasks Benefit from Large Pre-trained Models: A Survey [59.23394353614928]
In recent years, the rise of pre-trained models is driving the research on vision-language tasks.
Inspired by the powerful capabilities of pre-trained models, new paradigms have emerged to solve the classic challenges.
arXiv Detail & Related papers (2024-12-11T07:29:04Z) - A Survey of Reasoning with Foundation Models [235.7288855108172]
Reasoning plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation.
We introduce seminal foundation models proposed or adaptable for reasoning.
We then delve into the potential future directions behind the emergence of reasoning abilities within foundation models.
arXiv Detail & Related papers (2023-12-17T15:16:13Z) - Foundation Models Meet Visualizations: Challenges and Opportunities [23.01218856618978]
This paper divides visualizations for foundation models (VIS4FM) and foundation models for visualizations (FM4VIS)
In VIS4FM, we explore the primary role of visualizations in understanding, refining, and evaluating these intricate models.
In FM4VIS, we highlight how foundation models can be utilized to advance the visualization field itself.
arXiv Detail & Related papers (2023-10-09T14:57:05Z) - Survey of Social Bias in Vision-Language Models [65.44579542312489]
Survey aims to provide researchers with a high-level insight into the similarities and differences of social bias studies in pre-trained models across NLP, CV, and VL.
The findings and recommendations presented here can benefit the ML community, fostering the development of fairer and non-biased AI models.
arXiv Detail & Related papers (2023-09-24T15:34:56Z) - Towards Reasoning in Large Language Models: A Survey [11.35055307348939]
It is not yet clear to what extent large language models (LLMs) are capable of reasoning.
This paper provides a comprehensive overview of the current state of knowledge on reasoning in LLMs.
arXiv Detail & Related papers (2022-12-20T16:29:03Z) - Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future
Directions [23.389491536958772]
Vision-and-Language Navigation (VLN) is a fundamental and interdisciplinary research topic towards this goal.
VLN receives increasing attention from natural language processing, computer vision, robotics, and machine learning communities.
This paper serves as a thorough reference for the VLN research community.
arXiv Detail & Related papers (2022-03-22T16:58:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.