Augmenting Human Cognition through Everyday AR
- URL: http://arxiv.org/abs/2505.03492v1
- Date: Tue, 06 May 2025 12:48:38 GMT
- Title: Augmenting Human Cognition through Everyday AR
- Authors: Xiaoan Liu,
- Abstract summary: Always-on AR can seamlessly bridge digital cognition and physical affordances, enabling proactive, context-sensitive interactions.<n>This paper explores how always-on AR can seamlessly bridge digital cognition and physical affordances, enabling proactive, context-sensitive interactions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As spatial computing and multimodal LLMs mature, AR is tending to become an intuitive "thinking tool," embedding semantic and context-aware intelligence directly into everyday environments. This paper explores how always-on AR can seamlessly bridge digital cognition and physical affordances, enabling proactive, context-sensitive interactions that enhance human task performance and understanding.
Related papers
- Emergent Active Perception and Dexterity of Simulated Humanoids from Visual Reinforcement Learning [69.71072181304066]
We introduce Perceptive Dexterous Control (PDC), a framework for vision-driven whole-body control with simulated humanoids.<n>PDC operates solely on egocentric vision for task specification, enabling object search, target placement, and skill selection through visual cues.<n>We show that training from scratch with reinforcement learning can produce emergent behaviors such as active search.
arXiv Detail & Related papers (2025-05-18T07:33:31Z) - A Vision for AI-Driven Adaptation of Dynamic AR Content to Users and Environments [15.578223073114156]
This vision paper speculates on AI-driven approaches for adaptive AR content placement.<n>We aim to envision new opportunities for innovation and improvement in various industries.
arXiv Detail & Related papers (2025-04-23T09:42:38Z) - Leveraging LLMs with Iterative Loop Structure for Enhanced Social Intelligence in Video Question Answering [13.775516653315103]
Social intelligence is essential for effective communication and adaptive responses.<n>Current video-based methods for social intelligence rely on general video recognition or emotion recognition techniques.<n>We propose the Looped Video Debating framework, which integrates Large Language Models with visual information.
arXiv Detail & Related papers (2025-03-27T06:14:21Z) - Everyday AR through AI-in-the-Loop [16.65802031939239]
This workshop brings together experts and practitioners from augmented reality (AR) and artificial intelligence (AI) to shape the future of AI-in-the-loop everyday AR experiences.<n>We discuss a range of topics, including adaptive and context-aware AR, generative AR content creation, always-on AI assistants, AI-driven accessible design, and real-world-oriented AI agents.<n>Our goal is to identify the opportunities and challenges in AI-enabled AR, focusing on creating novel AR experiences that seamlessly blend the digital and physical worlds.
arXiv Detail & Related papers (2024-12-17T08:51:55Z) - Multimodal 3D Fusion and In-Situ Learning for Spatially Aware AI [10.335943413484815]
seamless integration of virtual and physical worlds in augmented reality benefits from the system semantically "understanding" the physical environment.
We introduce a multimodal 3D object representation that unifies both semantic and linguistic knowledge with the geometric representation.
We demonstrate the usefulness of the proposed system through two real-world AR applications on Magic Leap 2: a) spatial search in physical environments with natural language and b) an intelligent inventory system that tracks object changes over time.
arXiv Detail & Related papers (2024-10-06T23:25:21Z) - Multimodal Fusion with LLMs for Engagement Prediction in Natural Conversation [70.52558242336988]
We focus on predicting engagement in dyadic interactions by scrutinizing verbal and non-verbal cues, aiming to detect signs of disinterest or confusion.
In this work, we collect a dataset featuring 34 participants engaged in casual dyadic conversations, each providing self-reported engagement ratings at the end of each conversation.
We introduce a novel fusion strategy using Large Language Models (LLMs) to integrate multiple behavior modalities into a multimodal transcript''
arXiv Detail & Related papers (2024-09-13T18:28:12Z) - MISAR: A Multimodal Instructional System with Augmented Reality [38.79160527414268]
Augmented reality (AR) requires seamless integration of visual, auditory, and linguistic channels for optimized human-computer interaction.
Our study introduces an innovative method harnessing large language models (LLMs) to assimilate information from visual, auditory, and contextual modalities.
arXiv Detail & Related papers (2023-10-18T04:15:12Z) - Human-oriented Representation Learning for Robotic Manipulation [64.59499047836637]
Humans inherently possess generalizable visual representations that empower them to efficiently explore and interact with the environments in manipulation tasks.
We formalize this idea through the lens of human-oriented multi-task fine-tuning on top of pre-trained visual encoders.
Our Task Fusion Decoder consistently improves the representation of three state-of-the-art visual encoders for downstream manipulation policy-learning.
arXiv Detail & Related papers (2023-10-04T17:59:38Z) - Towards Ubiquitous Semantic Metaverse: Challenges, Approaches, and
Opportunities [68.03971716740823]
In recent years, ubiquitous semantic Metaverse has been studied to revolutionize immersive cyber-virtual experiences for augmented reality (AR) and virtual reality (VR) users.
This survey focuses on the representation and intelligence for the four fundamental system components in ubiquitous Metaverse.
arXiv Detail & Related papers (2023-07-13T11:14:46Z) - ArK: Augmented Reality with Knowledge Interactive Emergent Ability [115.72679420999535]
We develop an infinite agent that learns to transfer knowledge memory from general foundation models to novel domains.
The heart of our approach is an emerging mechanism, dubbed Augmented Reality with Knowledge Inference Interaction (ArK)
We show that our ArK approach, combined with large foundation models, significantly improves the quality of generated 2D/3D scenes.
arXiv Detail & Related papers (2023-05-01T17:57:01Z) - WordCraft: An Environment for Benchmarking Commonsense Agents [107.20421897619002]
We propose WordCraft, an RL environment based on Little Alchemy 2.
This lightweight environment is fast to run and built upon entities and relations inspired by real-world semantics.
arXiv Detail & Related papers (2020-07-17T18:40:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.