Language and Planning in Robotic Navigation: A Multilingual Evaluation of State-of-the-Art Models
- URL: http://arxiv.org/abs/2501.05478v1
- Date: Tue, 07 Jan 2025 16:01:25 GMT
- Title: Language and Planning in Robotic Navigation: A Multilingual Evaluation of State-of-the-Art Models
- Authors: Malak Mansour, Ahmed Aly, Bahey Tharwat, Sarim Hashmi, Dong An, Ian Reid,
- Abstract summary: This study presents the first-ever work in Arabic language integration within the Vision-and-Language Navigation (VLN) domain in robotics.<n>We perform a comprehensive evaluation of state-of-the-art multi-lingual Small Language Models (SLMs)<n>We demonstrate that our framework is capable of high-level planning for navigation tasks when provided with instructions in both English and Arabic.
- Score: 8.609733312518463
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) such as GPT-4, trained on huge amount of datasets spanning multiple domains, exhibit significant reasoning, understanding, and planning capabilities across various tasks. This study presents the first-ever work in Arabic language integration within the Vision-and-Language Navigation (VLN) domain in robotics, an area that has been notably underexplored in existing research. We perform a comprehensive evaluation of state-of-the-art multi-lingual Small Language Models (SLMs), including GPT-4o mini, Llama 3 8B, and Phi-3 medium 14B, alongside the Arabic-centric LLM, Jais. Our approach utilizes the NavGPT framework, a pure LLM-based instruction-following navigation agent, to assess the impact of language on navigation reasoning through zero-shot sequential action prediction using the R2R dataset. Through comprehensive experiments, we demonstrate that our framework is capable of high-level planning for navigation tasks when provided with instructions in both English and Arabic. However, certain models struggled with reasoning and planning in the Arabic language due to inherent limitations in their capabilities, sub-optimal performance, and parsing issues. These findings highlight the importance of enhancing planning and reasoning capabilities in language models for effective navigation, emphasizing this as a key area for further development while also unlocking the potential of Arabic-language models for impactful real-world applications.
Related papers
- Benchmarking Open-Source Large Language Models for Persian in Zero-Shot and Few-Shot Learning [0.0]
This paper presents a benchmark of several open-source Large Language Models (LLMs) for Persian Natural Language Processing tasks.<n>We evaluate models across a range of tasks including sentiment analysis, named entity recognition, reading comprehension, and question answering.<n>The results reveal that Gemma 2 consistently outperforms other models across nearly all tasks in both learning paradigms.
arXiv Detail & Related papers (2025-10-05T10:10:04Z) - The AI Language Proficiency Monitor -- Tracking the Progress of LLMs on Multilingual Benchmarks [0.0]
We introduce the AI Language Monitor, a comprehensive benchmark that assesses large language models (LLMs) performance across up to 200 languages.<n>Our benchmark aggregates diverse tasks including translation, question answering, math, and reasoning, using datasets such as FLORES+, MMLU, GSM8K, TruthfulQA, and ARC.<n>We provide an open-source, auto-updating leaderboard and dashboard that supports researchers, developers, and policymakers in identifying strengths and gaps in model performance.
arXiv Detail & Related papers (2025-07-11T12:38:02Z) - Lens: Rethinking Multilingual Enhancement for Large Language Models [70.85065197789639]
Lens is a novel approach to enhance multilingual capabilities of large language models (LLMs)
It operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs.
It achieves superior results with much fewer computational resources compared to existing post-training approaches.
arXiv Detail & Related papers (2024-10-06T08:51:30Z) - Do Large Language Models Speak All Languages Equally? A Comparative Study in Low-Resource Settings [12.507989493130175]
Large language models (LLMs) have garnered significant interest in natural language processing (NLP)
Recent studies have highlighted the limitations of LLMs in low-resource languages.
We present datasets for sentiment and hate speech tasks by translating from English to Bangla, Hindi, and Urdu.
arXiv Detail & Related papers (2024-08-05T05:09:23Z) - A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers [51.8203871494146]
The rapid development of Large Language Models (LLMs) demonstrates remarkable multilingual capabilities in natural language processing.<n>Despite the breakthroughs of LLMs, the investigation into the multilingual scenario remains insufficient.<n>This survey aims to help the research community address multilingual problems and provide a comprehensive understanding of the core concepts, key techniques, and latest developments in multilingual natural language processing based on LLMs.
arXiv Detail & Related papers (2024-05-17T17:47:39Z) - SUTRA: Scalable Multilingual Language Model Architecture [5.771289785515227]
We introduce SUTRA, a multilingual Large Language Model architecture capable of understanding, reasoning, and generating text in over 50 languages.
Through extensive evaluations, SUTRA is demonstrated to surpass existing models like GPT-3.5, Llama2 by 20-30% on leading Massive Multitask Language Understanding (MMLU) benchmarks.
Our findings suggest that SUTRA not only fills pivotal gaps in multilingual model capabilities but also establishes a new benchmark for operational efficiency and scalability in AI applications.
arXiv Detail & Related papers (2024-05-07T20:11:44Z) - Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning
Disentangled Reasoning [101.56342075720588]
Vision-and-Language Navigation (VLN), as a crucial research problem of Embodied AI, requires an embodied agent to navigate through complex 3D environments following natural language instructions.
Recent research has highlighted the promising capacity of large language models (LLMs) in VLN by improving navigational reasoning accuracy and interpretability.
This paper introduces a novel strategy called Navigational Chain-of-Thought (NavCoT), where we fulfill parameter-efficient in-domain training to enable self-guided navigational decision.
arXiv Detail & Related papers (2024-03-12T07:27:02Z) - PARADISE: Evaluating Implicit Planning Skills of Language Models with Procedural Warnings and Tips Dataset [0.0]
We present PARADISE, an abductive reasoning task using Q&A format on practical procedural text sourced from wikiHow.
It involves warning and tip inference tasks directly associated with goals, excluding intermediary steps, with the aim of testing the ability of the models to infer implicit knowledge of the plan solely from the given goal.
Our experiments, utilizing fine-tuned language models and zero-shot prompting, reveal the effectiveness of task-specific small models over large language models in most scenarios.
arXiv Detail & Related papers (2024-03-05T18:01:59Z) - DIALIGHT: Lightweight Multilingual Development and Evaluation of
Task-Oriented Dialogue Systems with Large Language Models [76.79929883963275]
DIALIGHT is a toolkit for developing and evaluating multilingual Task-Oriented Dialogue (ToD) systems.
It features a secure, user-friendly web interface for fine-grained human evaluation at both local utterance level and global dialogue level.
Our evaluations reveal that while PLM fine-tuning leads to higher accuracy and coherence, LLM-based systems excel in producing diverse and likeable responses.
arXiv Detail & Related papers (2024-01-04T11:27:48Z) - On the Planning, Search, and Memorization Capabilities of Large Language
Models [0.0]
We investigate the potential of the state-of-the-art large language model (GPT-4) for planning tasks.
We identify areas where large language models excel in solving planning problems and reveal the constraints that limit their applicability.
arXiv Detail & Related papers (2023-09-05T00:19:31Z) - Cross-Lingual NER for Financial Transaction Data in Low-Resource
Languages [70.25418443146435]
We propose an efficient modeling framework for cross-lingual named entity recognition in semi-structured text data.
We employ two independent datasets of SMSs in English and Arabic, each carrying semi-structured banking transaction information.
With access to only 30 labeled samples, our model can generalize the recognition of merchants, amounts, and other fields from English to Arabic.
arXiv Detail & Related papers (2023-07-16T00:45:42Z) - Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts [75.33019401706188]
Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars.
We propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English.
Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages.
arXiv Detail & Related papers (2023-06-20T08:27:47Z) - MULTI3NLU++: A Multilingual, Multi-Intent, Multi-Domain Dataset for
Natural Language Understanding in Task-Oriented Dialogue [115.32009638844059]
We extend the English only NLU++ dataset to include manual translations into a range of high, medium, and low resource languages.
Because of its multi-intent property, MULTI3NLU++ represents complex and natural user goals.
We use MULTI3NLU++ to benchmark state-of-the-art multilingual models for the Natural Language Understanding tasks of intent detection and slot labelling.
arXiv Detail & Related papers (2022-12-20T17:34:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.