MANGO: A Benchmark for Evaluating Mapping and Navigation Abilities of Large Language Models
- URL: http://arxiv.org/abs/2403.19913v2
- Date: Thu, 8 Aug 2024 06:38:31 GMT
- Title: MANGO: A Benchmark for Evaluating Mapping and Navigation Abilities of Large Language Models
- Authors: Peng Ding, Jiading Fang, Peng Li, Kangrui Wang, Xiaochen Zhou, Mo Yu, Jing Li, Matthew R. Walter, Hongyuan Mei,
- Abstract summary: Large language models such as ChatGPT and GPT-4 have recently achieved astonishing performance on a variety of natural language processing tasks.
We propose MANGO, a benchmark to evaluate their capabilities to perform text-based mapping and navigation.
- Score: 35.49165347434718
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models such as ChatGPT and GPT-4 have recently achieved astonishing performance on a variety of natural language processing tasks. In this paper, we propose MANGO, a benchmark to evaluate their capabilities to perform text-based mapping and navigation. Our benchmark includes 53 mazes taken from a suite of textgames: each maze is paired with a walkthrough that visits every location but does not cover all possible paths. The task is question-answering: for each maze, a large language model reads the walkthrough and answers hundreds of mapping and navigation questions such as "How should you go to Attic from West of House?" and "Where are we if we go north and east from Cellar?". Although these questions are easy to humans, it turns out that even GPT-4, the best-to-date language model, performs poorly at answering them. Further, our experiments suggest that a strong mapping and navigation ability would benefit large language models in performing relevant downstream tasks, such as playing textgames. Our MANGO benchmark will facilitate future research on methods that improve the mapping and navigation capabilities of language models. We host our leaderboard, data, code, and evaluation program at https://mango.ttic.edu and https://github.com/oaklight/mango/.
Related papers
- LangMap: A Hierarchical Benchmark for Open-Vocabulary Goal Navigation [34.074871694181965]
We introduce HieraNav, a goal navigation task where agents interpret natural language instructions to reach targets at four semantic levels.<n>We present Language as a Map (LangMap), a benchmark built on real-world 3D indoor scans with comprehensive human-verified annotations.<n>LangMap achieves superior annotation quality, outperforming GOAT-Bench by 23.8% in discriminative accuracy using four times fewer words.
arXiv Detail & Related papers (2026-02-02T15:26:19Z) - LangNavBench: Evaluation of Natural Language Understanding in Semantic Navigation [18.951580080771432]
LangNav is an open-set dataset specifically created to test an agent's ability to locate objects described at different levels of detail.<n>LangNavBench allows us to systematically compare models on their handling of attributes, spatial and relational cues, and category hierarchies.<n>MLFM is a method that builds a queryable multi-layered semantic map.
arXiv Detail & Related papers (2025-07-09T21:46:43Z) - Language and Planning in Robotic Navigation: A Multilingual Evaluation of State-of-the-Art Models [8.609733312518463]
This study presents the first-ever work in Arabic language integration within the Vision-and-Language Navigation (VLN) domain in robotics.
We perform a comprehensive evaluation of state-of-the-art multi-lingual Small Language Models (SLMs)
We demonstrate that our framework is capable of high-level planning for navigation tasks when provided with instructions in both English and Arabic.
arXiv Detail & Related papers (2025-01-07T16:01:25Z) - NAVCON: A Cognitively Inspired and Linguistically Grounded Corpus for Vision and Language Navigation [66.89717229608358]
NAVCON is a large-scale annotated Vision-Language Navigation (VLN) corpus built on top of two popular datasets (R2R and RxR)
arXiv Detail & Related papers (2024-12-17T15:48:25Z) - GameTraversalBenchmark: Evaluating Planning Abilities Of Large Language Models Through Traversing 2D Game Maps [5.874552372073687]
Large language models (LLMs) have recently demonstrated great success in generating and understanding natural language.
We propose GameTraversalBenchmark (GTB), a benchmark consisting of diverse 2D grid-based game maps.
GPT-4-Turbo achieved the highest score of 44.97% on GTB_Score (GTBS), a composite score that combines the three above criteria.
arXiv Detail & Related papers (2024-10-10T09:54:28Z) - E-ANT: A Large-Scale Dataset for Efficient Automatic GUI NavigaTion [34.85528852487379]
textbfE-ANT is the first Chinese GUI navigation dataset with 40,000 real human traces over 5000+ different tinyAPPs.
We evaluate various powerful MLLMs on E-ANT and show their experiments results with sufficient ablations.
arXiv Detail & Related papers (2024-06-20T12:22:05Z) - LaMOT: Language-Guided Multi-Object Tracking [13.866428951384124]
Vision-Language MOT aims to track objects based on human language commands.
Despite various efforts, a key challenge lies in the lack of a clear understanding of why language is used for tracking.
We introduce Language-Guided MOT, a unified task framework, along with a corresponding large-scale benchmark, termed LaMOT.
arXiv Detail & Related papers (2024-06-12T15:24:09Z) - Open-Set 3D Semantic Instance Maps for Vision Language Navigation -- O3D-SIM [6.475074453206891]
Humans excel at forming mental maps of their surroundings, equipping them to understand object relationships and navigate based on language queries.<n>We show that having instance-level information and the semantic understanding of an environment helps significantly improve performance for language-guided tasks.<n>We propose a representation that results in a 3D point cloud map with instance-level embeddings, which bring in the semantic understanding that natural language commands can query.
arXiv Detail & Related papers (2024-04-27T14:20:46Z) - IVLMap: Instance-Aware Visual Language Grounding for Consumer Robot Navigation [10.006058028927907]
Vision-and-Language Navigation (VLN) is a challenging task that requires a robot to navigate in photo-realistic environments with human natural language promptings.
Recent studies aim to handle this task by constructing the semantic spatial map representation of the environment.
We propose a new method, namely, Instance-aware Visual Language Map (IVLMap), to empower the robot with instance-level and attribute-level semantic mapping.
arXiv Detail & Related papers (2024-03-28T11:52:42Z) - Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language
Navigation [87.52136927091712]
We address a practical yet challenging problem of training robot agents to navigate in an environment following a path described by some language instructions.
To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the environment objects.
We propose a multi-granularity map, which contains both object fine-grained details (e.g., color, texture) and semantic classes, to represent objects more comprehensively.
arXiv Detail & Related papers (2022-10-14T04:23:27Z) - LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language,
Vision, and Action [76.71101507291473]
We present a system, LM-Nav, for robotic navigation that enjoys the benefits of training on unannotated large datasets of trajectories.
We show that such a system can be constructed entirely out of pre-trained models for navigation (ViNG), image-language association (CLIP), and language modeling (GPT-3), without requiring any fine-tuning or language-annotated robot data.
arXiv Detail & Related papers (2022-07-10T10:41:50Z) - Find a Way Forward: a Language-Guided Semantic Map Navigator [53.69229615952205]
This paper attacks the problem of language-guided navigation in a new perspective.
We use novel semantic navigation maps, which enables robots to carry out natural language instructions and move to a target position based on the map observations.
The proposed approach has noticeable performance gains, especially in long-distance navigation cases.
arXiv Detail & Related papers (2022-03-07T07:40:33Z) - Code to Comment "Translation": Data, Metrics, Baselining & Evaluation [49.35567240750619]
We analyze several recent code-comment datasets for this task.
We compare them with WMT19, a standard dataset frequently used to train state of the art natural language translators.
We find some interesting differences between the code-comment data and the WMT19 natural language data.
arXiv Detail & Related papers (2020-10-03T18:57:26Z) - TuringAdvice: A Generative and Dynamic Evaluation of Language Use [90.3029315711237]
We propose TuringAdvice, a new challenge task and dataset for language understanding models.
Given a written situation that a real person is currently facing, a model must generate helpful advice in natural language.
Empirical results show that today's models struggle at TuringAdvice.
arXiv Detail & Related papers (2020-04-07T18:00:03Z) - Enhancing lexical-based approach with external knowledge for Vietnamese
multiple-choice machine reading comprehension [2.5199066832791535]
We construct a dataset which consists of 2,783 pairs of multiple-choice questions and answers based on 417 Vietnamese texts.
We propose a lexical-based MRC method that utilizes semantic similarity measures and external knowledge sources to analyze questions and extract answers from the given text.
Our proposed method achieves 61.81% by accuracy, which is 5.51% higher than the best baseline model.
arXiv Detail & Related papers (2020-01-16T08:09:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.