LLMER: Crafting Interactive Extended Reality Worlds with JSON Data   Generated by Large Language Models
        - URL: http://arxiv.org/abs/2502.02441v1
 - Date: Tue, 04 Feb 2025 16:08:48 GMT
 - Title: LLMER: Crafting Interactive Extended Reality Worlds with JSON Data   Generated by Large Language Models
 - Authors: Jiangong Chen, Xiaoyi Wu, Tian Lan, Bin Li, 
 - Abstract summary: The integration of Large Language Models (LLMs) with Extended Reality (XR) technologies offers the potential to build truly immersive XR environments.<n>The complexity of XR environments makes it difficult to accurately extract relevant contextual data and scene/object parameters from an overwhelming volume of XR artifacts.<n>To overcome these challenges, we introduce a novel framework that creates interactive worlds using LLMERs.
 - Score: 22.53412407516448
 - License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
 - Abstract:   The integration of Large Language Models (LLMs) like GPT-4 with Extended Reality (XR) technologies offers the potential to build truly immersive XR environments that interact with human users through natural language, e.g., generating and animating 3D scenes from audio inputs. However, the complexity of XR environments makes it difficult to accurately extract relevant contextual data and scene/object parameters from an overwhelming volume of XR artifacts. It leads to not only increased costs with pay-per-use models, but also elevated levels of generation errors. Moreover, existing approaches focusing on coding script generation are often prone to generation errors, resulting in flawed or invalid scripts, application crashes, and ultimately a degraded user experience. To overcome these challenges, we introduce LLMER, a novel framework that creates interactive XR worlds using JSON data generated by LLMs. Unlike prior approaches focusing on coding script generation, LLMER translates natural language inputs into JSON data, significantly reducing the likelihood of application crashes and processing latency. It employs a multi-stage strategy to supply only the essential contextual information adapted to the user's request and features multiple modules designed for various XR tasks. Our preliminary user study reveals the effectiveness of the proposed system, with over 80% reduction in consumed tokens and around 60% reduction in task completion time compared to state-of-the-art approaches. The analysis of users' feedback also illuminates a series of directions for further optimization. 
 
       
      
        Related papers
        - RouteNator: A Router-Based Multi-Modal Architecture for Generating   Synthetic Training Data for Function Calling LLMs [3.41612427812159]
In digital content creation tools, users express their needs through natural language queries that must be mapped to API calls.<n>Existing approaches to synthetic data generation fail to replicate real-world data distributions.<n>We present a novel router-based architecture that generates high-quality synthetic training data.
arXiv  Detail & Related papers  (2025-05-15T16:53:45Z) - Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.
However, they still struggle with problems requiring multi-step decision-making and environmental feedback.
We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv  Detail & Related papers  (2025-02-17T18:49:25Z) - LUCID: LLM-Generated Utterances for Complex and Interesting Dialogues [38.6183579217801]
Virtual assistants are poised to take a leap forward in terms of their dialogue capabilities.
Yet a major bottleneck to achieving genuinely transformative task-oriented dialogue capabilities remains the scarcity of high quality data.
We use LUCID to generate a seed dataset of 4,277 conversations across 100 intents to demonstrate its capabilities.
arXiv  Detail & Related papers  (2024-03-01T11:33:53Z) - AXOLOTL: Fairness through Assisted Self-Debiasing of Large Language
  Model Outputs [20.772266479533776]
AXOLOTL is a novel post-processing framework that operates agnostically across tasks and models.
It identifies biases, proposes resolutions, and guides the model to self-debias its outputs.
This approach minimizes computational costs and preserves model performance.
arXiv  Detail & Related papers  (2024-03-01T00:02:37Z) - Text2Data: Low-Resource Data Generation with Textual Control [100.5970757736845]
Text2Data is a novel approach that utilizes unlabeled data to understand the underlying data distribution.<n>It undergoes finetuning via a novel constraint optimization-based learning objective that ensures controllability and effectively counteracts catastrophic forgetting.
arXiv  Detail & Related papers  (2024-02-08T03:41:39Z) - LLMR: Real-time Prompting of Interactive Worlds using Large Language   Models [45.87888748442536]
Large Language Model for Mixed Reality (LLMR) is a framework for the real-time creation and modification of interactive Mixed Reality experiences.
Our framework relies on text interaction and the Unity game engine.
LLMR outperforms the standard GPT-4 by 4x in average error rate.
arXiv  Detail & Related papers  (2023-09-21T17:37:01Z) - Extrapolating Multilingual Understanding Models as Multilingual
  Generators [82.1355802012414]
This paper explores methods to empower multilingual understanding models the generation abilities to get a unified model.
We propose a textbfSemantic-textbfGuided textbfAlignment-then-Denoising (SGA) approach to adapt an encoder to a multilingual generator with a small number of new parameters.
arXiv  Detail & Related papers  (2023-05-22T15:33:21Z) - XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented
  Languages [105.54207724678767]
Data scarcity is a crucial issue for the development of highly multilingual NLP systems.
We propose XTREME-UP, a benchmark defined by its focus on the scarce-data scenario rather than zero-shot.
XTREME-UP evaluates the capabilities of language models across 88 under-represented languages over 9 key user-centric technologies.
arXiv  Detail & Related papers  (2023-05-19T18:00:03Z) - An Iterative Optimizing Framework for Radiology Report Summarization   with ChatGPT [80.33783969507458]
The 'Impression' section of a radiology report is a critical basis for communication between radiologists and other physicians.
Recent studies have achieved promising results in automatic impression generation using large-scale medical text data.
These models often require substantial amounts of medical text data and have poor generalization performance.
arXiv  Detail & Related papers  (2023-04-17T17:13:42Z) - PRESTO: A Multilingual Dataset for Parsing Realistic Task-Oriented
  Dialogs [39.58414649004708]
PRESTO is a dataset of over 550K contextual multilingual conversations between humans and virtual assistants.
It contains challenges that occur in real-world NLU tasks such as disfluencies, code-switching, and revisions.
Our mT5 model based baselines demonstrate that the conversational phenomenon present in PRESTO are challenging to model.
arXiv  Detail & Related papers  (2023-03-15T21:51:13Z) - MuRAG: Multimodal Retrieval-Augmented Generator for Open Question
  Answering over Images and Text [58.655375327681774]
We propose the first Multimodal Retrieval-Augmented Transformer (MuRAG)
MuRAG accesses an external non-parametric multimodal memory to augment language generation.
Our results show that MuRAG achieves state-of-the-art accuracy, outperforming existing models by 10-20% absolute on both datasets.
arXiv  Detail & Related papers  (2022-10-06T13:58:03Z) - UnrealROX+: An Improved Tool for Acquiring Synthetic Data from Virtual
  3D Environments [14.453602631430508]
We present an improved version of UnrealROX, a tool to generate synthetic data from robotic images.
Un UnrealROX+ includes new features such as generating albedo or a Python API for interacting with the virtual environment from Deep Learning frameworks.
arXiv  Detail & Related papers  (2021-04-23T18:45:42Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.