Related papers: A Human Digital Twin Architecture for Knowledge-based Interactions and Context-Aware Conversations

A Human Digital Twin Architecture for Knowledge-based Interactions and Context-Aware Conversations

URL: http://arxiv.org/abs/2504.03147v1
Date: Fri, 04 Apr 2025 03:56:26 GMT
Title: A Human Digital Twin Architecture for Knowledge-based Interactions and Context-Aware Conversations
Authors: Abdul Mannan Mohammed, Azhar Ali Mohammad, Jason A. Ortiz, Carsten Neumann, Grace Bochenek, Dirk Reiners, Carolina Cruz-Neira,
Abstract summary: Recent developments in Artificial Intelligence (AI) and Machine Learning (ML) are creating new opportunities for Human-Autonomy Teaming (HAT)<n>We present a real-time Human Digital Twin (HDT) architecture that integrates Large Language Models (LLMs) for knowledge reporting, answering, and recommendation, embodied in a visual interface.<n>The HDT acts as a visually and behaviorally realistic team member, integrated throughout the mission lifecycle, from training to deployment to after-action review.
Score: 0.9580312063277943
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent developments in Artificial Intelligence (AI) and Machine Learning (ML) are creating new opportunities for Human-Autonomy Teaming (HAT) in tasks, missions, and continuous coordinated activities. A major challenge is enabling humans to maintain awareness and control over autonomous assets, while also building trust and supporting shared contextual understanding. To address this, we present a real-time Human Digital Twin (HDT) architecture that integrates Large Language Models (LLMs) for knowledge reporting, answering, and recommendation, embodied in a visual interface. The system applies a metacognitive approach to enable personalized, context-aware responses aligned with the human teammate's expectations. The HDT acts as a visually and behaviorally realistic team member, integrated throughout the mission lifecycle, from training to deployment to after-action review. Our architecture includes speech recognition, context processing, AI-driven dialogue, emotion modeling, lip-syncing, and multimodal feedback. We describe the system design, performance metrics, and future development directions for more adaptive and realistic HAT systems.

Related papers

Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions [110.43343503158306]
This paper embeds the manual-assisted task into a vision-language-action framework, where the assistant provides services to the instructor following egocentric vision and commands.<n>Under this setting, we accomplish InterVLA, the first large-scale human-object-human interaction dataset with 11.4 hours and 1.2M frames of multimodal data.<n>We establish novel benchmarks on egocentric human motion estimation, interaction synthesis, and interaction prediction with comprehensive analysis.
arXiv Detail & Related papers (2025-08-06T17:46:23Z)
Seamless Interaction: Dyadic Audiovisual Motion Modeling and Large-Scale Dataset [113.25650486482762]
We introduce the Seamless Interaction dataset, a large-scale collection of over 4,000 hours of face-to-face interaction footage.<n>This dataset enables the development of AI technologies that understand dyadic embodied dynamics.<n>We develop a suite of models that utilize the dataset to generate dyadic motion gestures and facial expressions aligned with human speech.
arXiv Detail & Related papers (2025-06-27T18:09:49Z)
Embodied AI Agents: Modeling the World [188.85697524284834]
This paper describes our research on AI agents embodied in visual, virtual or physical forms.<n>We propose that the development of world models is central to reasoning and planning of embodied AI agents.<n>We also propose to learn the mental world model of users to enable better human-agent collaboration.
arXiv Detail & Related papers (2025-06-27T16:05:34Z)
A Desideratum for Conversational Agents: Capabilities, Challenges, and Future Directions [51.96890647837277]
Large Language Models (LLMs) have propelled conversational AI from traditional dialogue systems into sophisticated agents capable of autonomous actions, contextual awareness, and multi-turn interactions with users. This survey paper presents a desideratum for next-generation Conversational Agents - what has been achieved, what challenges persist, and what must be done for more scalable systems that approach human-level intelligence.
arXiv Detail & Related papers (2025-04-07T21:01:25Z)
Building Knowledge from Interactions: An LLM-Based Architecture for Adaptive Tutoring and Social Reasoning [42.09560737219404]
Large Language Models show promise in human-like communication, but their standalone use is hindered by memory constraints and contextual incoherence.<n>This work presents a multimodal, cognitively inspired framework that enhances LLM-based autonomous decision-making in social and task-oriented Human-Robot Interaction.<n>To further enhance autonomy and personalization, we introduce a memory system for selecting, storing and retrieving experiences.
arXiv Detail & Related papers (2025-04-02T10:45:41Z)
Leveraging LLMs with Iterative Loop Structure for Enhanced Social Intelligence in Video Question Answering [13.775516653315103]
Social intelligence is essential for effective communication and adaptive responses.<n>Current video-based methods for social intelligence rely on general video recognition or emotion recognition techniques.<n>We propose the Looped Video Debating framework, which integrates Large Language Models with visual information.
arXiv Detail & Related papers (2025-03-27T06:14:21Z)
PersLLM: A Personified Training Approach for Large Language Models [66.16513246245401]
We propose PersLLM, a framework for better data construction and model tuning.<n>For insufficient data usage, we incorporate strategies such as Chain-of-Thought prompting and anti-induction.<n>For rigid behavior patterns, we design the tuning process and introduce automated DPO to enhance the specificity and dynamism of the models' personalities.
arXiv Detail & Related papers (2024-07-17T08:13:22Z)
Maia: A Real-time Non-Verbal Chat for Human-AI Interaction [10.580858171606167]
We propose an alternative to text-based Human-AI interaction.<n>By leveraging nonverbal visual communication, through facial expressions, head and body movements, we aim to enhance engagement.<n>Our approach is not art-specific and can be adapted to various paintings, animations, and avatars.
arXiv Detail & Related papers (2024-02-09T13:07:22Z)
Agent AI: Surveying the Horizons of Multimodal Interaction [83.18367129924997]
"Agent AI" is a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data. We envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment.
arXiv Detail & Related papers (2024-01-07T19:11:18Z)
Enabling High-Level Machine Reasoning with Cognitive Neuro-Symbolic Systems [67.01132165581667]
We propose to enable high-level reasoning in AI systems by integrating cognitive architectures with external neuro-symbolic components. We illustrate a hybrid framework centered on ACT-R and we discuss the role of generative models in recent and future applications.
arXiv Detail & Related papers (2023-11-13T21:20:17Z)
Human-oriented Representation Learning for Robotic Manipulation [64.59499047836637]
Humans inherently possess generalizable visual representations that empower them to efficiently explore and interact with the environments in manipulation tasks. We formalize this idea through the lens of human-oriented multi-task fine-tuning on top of pre-trained visual encoders. Our Task Fusion Decoder consistently improves the representation of three state-of-the-art visual encoders for downstream manipulation policy-learning.
arXiv Detail & Related papers (2023-10-04T17:59:38Z)
Building Human-like Communicative Intelligence: A Grounded Perspective [1.0152838128195465]
After making astounding progress in language learning, AI systems seem to approach the ceiling that does not reflect important aspects of human communicative capacities. This paper suggests that the dominant cognitively-inspired AI directions, based on nativist and symbolic paradigms, lack necessary substantiation and concreteness to guide progress in modern AI. I propose a list of concrete, implementable components for building "grounded" linguistic intelligence.
arXiv Detail & Related papers (2022-01-02T01:43:24Z)
Human in the Loop for Machine Creativity [0.0]
We conceptualize existing and future human-in-the-loop (HITL) approaches for creative applications. We examine and speculate on long term implications for models, interfaces, and machine creativity. We envision multimodal HITL processes, where texts, visuals, sounds, and other information are coupled together, with automated analysis of humans and environments.
arXiv Detail & Related papers (2021-10-07T15:42:18Z)
The MineRL BASALT Competition on Learning from Human Feedback [58.17897225617566]
The MineRL BASALT competition aims to spur forward research on this important class of techniques. We design a suite of four tasks in Minecraft for which we expect it will be hard to write down hardcoded reward functions. We provide a dataset of human demonstrations on each of the four tasks, as well as an imitation learning baseline.
arXiv Detail & Related papers (2021-07-05T12:18:17Z)
Cognitive architecture aided by working-memory for self-supervised multi-modal humans recognition [54.749127627191655]
The ability to recognize human partners is an important social skill to build personalized and long-term human-robot interactions. Deep learning networks have achieved state-of-the-art results and demonstrated to be suitable tools to address such a task. One solution is to make robots learn from their first-hand sensory data with self-supervision.
arXiv Detail & Related papers (2021-03-16T13:50:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.