Related papers: AtomXR: Streamlined XR Prototyping with Natural Language and Immersive Physical Interaction

AtomXR: Streamlined XR Prototyping with Natural Language and Immersive Physical Interaction

URL: http://arxiv.org/abs/2311.11238v1
Date: Sun, 19 Nov 2023 05:52:25 GMT
Title: AtomXR: Streamlined XR Prototyping with Natural Language and Immersive Physical Interaction
Authors: Alice Cai, Caine Ardayfio, AnhPhu Nguyen, Tica Lin, Elena Glassman
Abstract summary: AtomXR is a streamlined, immersive, no-code XR prototyping tool designed to empower developers in creating applications using natural language, eye-gaze, and touch interactions. AtomXR consists of: 1) AtomScript, a high-level human-interpretable scripting language for rapid prototyping, 2) a natural language interface that integrates LLMs and multimodal inputs for AtomScript generation, and 3) an immersive in-headset authoring environment. Empirical evaluation through two user studies offers insights into natural language-based and immersive prototyping, and shows AtomXR provides significant improvements in speed and user experience compared to traditional systems
Score: 2.02671066150924
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As technological advancements in extended reality (XR) amplify the demand for more XR content, traditional development processes face several challenges: 1) a steep learning curve for inexperienced developers, 2) a disconnect between 2D development environments and 3D user experiences inside headsets, and 3) slow iteration cycles due to context switching between development and testing environments. To address these challenges, we introduce AtomXR, a streamlined, immersive, no-code XR prototyping tool designed to empower both experienced and inexperienced developers in creating applications using natural language, eye-gaze, and touch interactions. AtomXR consists of: 1) AtomScript, a high-level human-interpretable scripting language for rapid prototyping, 2) a natural language interface that integrates LLMs and multimodal inputs for AtomScript generation, and 3) an immersive in-headset authoring environment. Empirical evaluation through two user studies offers insights into natural language-based and immersive prototyping, and shows AtomXR provides significant improvements in speed and user experience compared to traditional systems.

Related papers

When Generative AI Meets Extended Reality: Enabling Scalable and Natural Interactions [8.808170696228865]
Generative AI (GenAI) enables intuitive, language-driven interaction and automating content generation.<n>This paper explores the integration of XR and GenAI through three concrete use cases, showing how they address key obstacles in scalability and natural interaction.
arXiv Detail & Related papers (2026-01-13T15:21:08Z)
FronTalk: Benchmarking Front-End Development as Conversational Code Generation with Multi-Modal Feedback [92.67587639164908]
We present FronTalk, a benchmark for front-end code generation with multi-modal feedback.<n>We focus on the front-end development task and curate FronTalk, a collection of 100 multi-turn dialogues.<n> Evaluation of 20 models reveals two key challenges that are under-explored systematically in the literature.
arXiv Detail & Related papers (2025-12-05T23:28:09Z)
QiMeng-CRUX: Narrowing the Gap between Natural Language and Verilog via Core Refined Understanding eXpression [48.84841760215598]
Large language models (LLMs) have shown promising capabilities in hardware description language (HDL) generation.<n>Existing approaches often rely on free-form natural language descriptions that are often ambiguous, redundant, and unstructured.<n>We treat hardware code generation as a complex transformation from an open-ended natural language space to a domain-specific, highly constrained target space.<n>We introduce Core Refined Understanding eXpression (CRUX), a structured intermediate space that captures the essential semantics of user intent while organizing the expression for precise Verilog code generation.
arXiv Detail & Related papers (2025-11-25T09:17:32Z)
Rapid Development of Omics Data Analysis Applications through Vibe Coding [0.0]
I demonstrate that modern large language models (LLMs) and autonomous coding agents can dramatically lower this barrier.<n>I used Vibe coding to create a fully functional data analysis website capable of performing standard tasks.<n>The entire application, including user interface, backend logic, and data upload pipeline, was developed in less than ten minutes using only four natural-language prompts.
arXiv Detail & Related papers (2025-10-10T19:06:27Z)
XR Blocks: Accelerating Human-centered AI + XR Innovation [15.103185935604323]
XR Blocks is a cross-platform framework designed to accelerate human-centered AI + XR innovation.<n>It provides a modular architecture with plug-and-play components for core abstraction in AI + XR: user, world, peers; interface, context, and agents.
arXiv Detail & Related papers (2025-09-29T21:00:53Z)
HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning [14.038083767470019]
Embodied agents operating in smart homes must understand human behavior through diverse sensory inputs and communicate via natural language.<n>In this paper, we introduce HoloLLM, a Multimodal Large Language Model (MLLM) that integrates uncommon but powerful sensing modalities.<n>We show that HoloLLM significantly outperforms existing MLLMs, improving language-grounded human sensing accuracy by up to 30%.
arXiv Detail & Related papers (2025-05-23T09:06:09Z)
Langformers: Unified NLP Pipelines for Language Models [3.690904966341072]
Langformers is an open-source Python library designed to streamline NLP pipelines. It integrates conversational AI, pretraining, text classification, sentence embedding/reranking, data labelling, semantic search, and knowledge distillation into a cohesive API.
arXiv Detail & Related papers (2025-04-12T10:17:49Z)
LLMER: Crafting Interactive Extended Reality Worlds with JSON Data Generated by Large Language Models [22.53412407516448]
The integration of Large Language Models (LLMs) with Extended Reality (XR) technologies offers the potential to build truly immersive XR environments. The complexity of XR environments makes it difficult to accurately extract relevant contextual data and scene/object parameters from an overwhelming volume of XR artifacts. To overcome these challenges, we introduce a novel framework that creates interactive worlds using LLMERs.
arXiv Detail & Related papers (2025-02-04T16:08:48Z)
Darkit: A User-Friendly Software Toolkit for Spiking Large Language Model [50.37090759139591]
Large language models (LLMs) have been widely applied in various practical applications, typically comprising billions of parameters. The human brain, employing bio-plausible spiking mechanisms, can accomplish the same tasks while significantly reducing energy consumption. We are releasing a software toolkit named DarwinKit (Darkit) to accelerate the adoption of brain-inspired large language models.
arXiv Detail & Related papers (2024-12-20T07:50:08Z)
CUIfy the XR: An Open-Source Package to Embed LLM-powered Conversational Agents in XR [31.49021749468963]
Large language model (LLM)powered non-player characters (NPCs) with speech-to-text (STT) and text-to-speech (TTS) models bring significant advantages over conventional or pre-scripted NPCs for facilitating more natural conversational user interfaces (CUIs) in XR. We provide the community with an open-source, customizable, and privacy-aware Unity package, CUIfy, that facilitates speech-based NPC-user interaction with various LLMs, STT, and TTS models.
arXiv Detail & Related papers (2024-11-07T12:55:17Z)
Scaling Instructable Agents Across Many Simulated Worlds [70.97268311053328]
Our goal is to develop an agent that can accomplish anything a human can do in any simulated 3D environment. Our approach focuses on language-driven generality while imposing minimal assumptions. Our agents interact with environments in real-time using a generic, human-like interface.
arXiv Detail & Related papers (2024-03-13T17:50:32Z)
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation. We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language. We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z)
Dialogue-based generation of self-driving simulation scenarios using Large Language Models [14.86435467709869]
Simulation is an invaluable tool for developing and evaluating controllers for self-driving cars. Current simulation frameworks are driven by highly-specialist domain specific languages. There is often a gap between a concise English utterance and the executable code that captures the user's intent.
arXiv Detail & Related papers (2023-10-26T13:07:01Z)
CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation [73.78984332354636]
CorNav is a novel zero-shot framework for vision-and-language navigation. It incorporates environmental feedback for refining future plans and adjusting its actions. It consistently outperforms all baselines in a zero-shot multi-task setting.
arXiv Detail & Related papers (2023-06-17T11:44:04Z)
ReactGenie: A Development Framework for Complex Multimodal Interactions Using Large Language Models [12.0218963520643]
multimodal interfaces can surpass the efficiency of either modality alone. This paper presents ReactGenie, a programming framework that better separates multimodal input from the computational model. Our evaluation showed that 12 developers can learn and build a nontrivial ReactGenie application in under 2.5 hours on average.
arXiv Detail & Related papers (2023-06-16T06:53:26Z)
PADL: Language-Directed Physics-Based Character Control [66.517142635815]
We present PADL, which allows users to issue natural language commands for specifying high-level tasks and low-level skills that a character should perform. We show that our framework can be applied to effectively direct a simulated humanoid character to perform a diverse array of complex motor skills.
arXiv Detail & Related papers (2023-01-31T18:59:22Z)
GenNI: Human-AI Collaboration for Data-Backed Text Generation [102.08127062293111]
Table2Text systems generate textual output based on structured data utilizing machine learning. GenNI (Generation Negotiation Interface) is an interactive visual system for high-level human-AI collaboration in producing descriptive text.
arXiv Detail & Related papers (2021-10-19T18:07:07Z)
VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs [103.99315770490163]
We present a framework for text generation from multimodal inputs consisting of video plus text, speech, or audio. Experiments demonstrate that our approach based on a single architecture outperforms the state-of-the-art on three video-based text-generation tasks.
arXiv Detail & Related papers (2021-01-28T15:22:36Z)
VisualHints: A Visual-Lingual Environment for Multimodal Reinforcement Learning [14.553086325168803]
We present VisualHints, a novel environment for multimodal reinforcement learning (RL) involving text-based interactions along with visual hints (obtained from the environment) We introduce an extension of the TextWorld cooking environment with the addition of visual clues interspersed throughout the environment. The goal is to force an RL agent to use both text and visual features to predict natural language action commands for solving the final task of cooking a meal.
arXiv Detail & Related papers (2020-10-26T18:51:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.