LLMR: Real-time Prompting of Interactive Worlds using Large Language Models
- URL: http://arxiv.org/abs/2309.12276v3
- Date: Fri, 22 Mar 2024 17:28:17 GMT
- Title: LLMR: Real-time Prompting of Interactive Worlds using Large Language Models
- Authors: Fernanda De La Torre, Cathy Mengying Fang, Han Huang, Andrzej Banburski-Fahey, Judith Amores Fernandez, Jaron Lanier,
- Abstract summary: Large Language Model for Mixed Reality (LLMR) is a framework for the real-time creation and modification of interactive Mixed Reality experiences.
Our framework relies on text interaction and the Unity game engine.
LLMR outperforms the standard GPT-4 by 4x in average error rate.
- Score: 45.87888748442536
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Large Language Model for Mixed Reality (LLMR), a framework for the real-time creation and modification of interactive Mixed Reality experiences using LLMs. LLMR leverages novel strategies to tackle difficult cases where ideal training data is scarce, or where the design goal requires the synthesis of internal dynamics, intuitive analysis, or advanced interactivity. Our framework relies on text interaction and the Unity game engine. By incorporating techniques for scene understanding, task planning, self-debugging, and memory management, LLMR outperforms the standard GPT-4 by 4x in average error rate. We demonstrate LLMR's cross-platform interoperability with several example worlds, and evaluate it on a variety of creation and modification tasks to show that it can produce and edit diverse objects, tools, and scenes. Finally, we conducted a usability study (N=11) with a diverse set that revealed participants had positive experiences with the system and would use it again.
Related papers
- LLMER: Crafting Interactive Extended Reality Worlds with JSON Data Generated by Large Language Models [22.53412407516448]
The integration of Large Language Models (LLMs) with Extended Reality (XR) technologies offers the potential to build truly immersive XR environments.
The complexity of XR environments makes it difficult to accurately extract relevant contextual data and scene/object parameters from an overwhelming volume of XR artifacts.
To overcome these challenges, we introduce a novel framework that creates interactive worlds using LLMERs.
arXiv Detail & Related papers (2025-02-04T16:08:48Z) - Dynamic benchmarking framework for LLM-based conversational data capture [0.0]
This paper introduces a benchmarking framework to assess large language models (LLMs)
It integrates generative agent simulation to evaluate performance on key dimensions: information extraction, context awareness, and adaptive engagement.
Results show that adaptive strategies improve data extraction accuracy, especially when handling ambiguous responses.
arXiv Detail & Related papers (2025-02-04T15:47:47Z) - MIRAGE: Exploring How Large Language Models Perform in Complex Social Interactive Environments [0.0]
This paper introduces the Multiverse Interactive Role-play Ability General Evaluation (MIRAGE)
MIRAGE is a framework designed to assess Large Language Models' proficiency in portraying advanced human behaviors through murder mystery games.
Our experiments indicate that even popular models like GPT-4 face significant challenges in navigating the complexities presented by the MIRAGE.
arXiv Detail & Related papers (2025-01-03T06:07:48Z) - Analyzing Multimodal Interaction Strategies for LLM-Assisted Manipulation of 3D Scenes [20.669785157017486]
We combine quantitative usage data with post-experience questionnaire feedback to reveal common interaction patterns and key barriers in LLM-assisted 3D scene editing systems.
We propose design recommendations for future LLM-integrated 3D content creation systems.
arXiv Detail & Related papers (2024-10-29T16:15:59Z) - LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments [70.91258869156353]
We introduce LangSuitE, a versatile and simulation-free testbed featuring 6 representative embodied tasks in textual embodied worlds.
Compared with previous LLM-based testbeds, LangSuitE offers adaptability to diverse environments without multiple simulation engines.
We devise a novel chain-of-thought (CoT) schema, EmMem, which summarizes embodied states w.r.t. history information.
arXiv Detail & Related papers (2024-06-24T03:36:29Z) - Text-Video Retrieval with Global-Local Semantic Consistent Learning [122.15339128463715]
We propose a simple yet effective method, Global-Local Semantic Consistent Learning (GLSCL)
GLSCL capitalizes on latent shared semantics across modalities for text-video retrieval.
Our method achieves comparable performance with SOTA as well as being nearly 220 times faster in terms of computational cost.
arXiv Detail & Related papers (2024-05-21T11:59:36Z) - Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - MEIA: Multimodal Embodied Perception and Interaction in Unknown Environments [82.67236400004826]
We introduce the Multimodal Embodied Interactive Agent (MEIA), capable of translating high-level tasks expressed in natural language into a sequence of executable actions.
MEM module enables MEIA to generate executable action plans based on diverse requirements and the robot's capabilities.
arXiv Detail & Related papers (2024-02-01T02:43:20Z) - MAgIC: Investigation of Large Language Model Powered Multi-Agent in
Cognition, Adaptability, Rationality and Collaboration [102.41118020705876]
Large Language Models (LLMs) have marked a significant advancement in the field of natural language processing.
As their applications extend into multi-agent environments, a need has arisen for a comprehensive evaluation framework.
This work introduces a novel benchmarking framework specifically tailored to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z) - MISAR: A Multimodal Instructional System with Augmented Reality [38.79160527414268]
Augmented reality (AR) requires seamless integration of visual, auditory, and linguistic channels for optimized human-computer interaction.
Our study introduces an innovative method harnessing large language models (LLMs) to assimilate information from visual, auditory, and contextual modalities.
arXiv Detail & Related papers (2023-10-18T04:15:12Z) - Pre-training Contextualized World Models with In-the-wild Videos for
Reinforcement Learning [54.67880602409801]
In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of visual control tasks.
We introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling.
Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample efficiency of model-based reinforcement learning.
arXiv Detail & Related papers (2023-05-29T14:29:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.