ArK: Augmented Reality with Knowledge Interactive Emergent Ability
- URL: http://arxiv.org/abs/2305.00970v1
- Date: Mon, 1 May 2023 17:57:01 GMT
- Title: ArK: Augmented Reality with Knowledge Interactive Emergent Ability
- Authors: Qiuyuan Huang, Jae Sung Park, Abhinav Gupta, Paul Bennett, Ran Gong,
Subhojit Som, Baolin Peng, Owais Khan Mohammed, Chris Pal, Yejin Choi,
Jianfeng Gao
- Abstract summary: We develop an infinite agent that learns to transfer knowledge memory from general foundation models to novel domains.
The heart of our approach is an emerging mechanism, dubbed Augmented Reality with Knowledge Inference Interaction (ArK)
We show that our ArK approach, combined with large foundation models, significantly improves the quality of generated 2D/3D scenes.
- Score: 115.72679420999535
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the growing adoption of mixed reality and interactive AI agents, it
remains challenging for these systems to generate high quality 2D/3D scenes in
unseen environments. The common practice requires deploying an AI agent to
collect large amounts of data for model training for every new task. This
process is costly, or even impossible, for many domains. In this study, we
develop an infinite agent that learns to transfer knowledge memory from general
foundation models (e.g. GPT4, DALLE) to novel domains or scenarios for scene
understanding and generation in the physical or virtual world. The heart of our
approach is an emerging mechanism, dubbed Augmented Reality with Knowledge
Inference Interaction (ArK), which leverages knowledge-memory to generate
scenes in unseen physical world and virtual reality environments. The knowledge
interactive emergent ability (Figure 1) is demonstrated as the observation
learns i) micro-action of cross-modality: in multi-modality models to collect a
large amount of relevant knowledge memory data for each interaction task (e.g.,
unseen scene understanding) from the physical reality; and ii) macro-behavior
of reality-agnostic: in mix-reality environments to improve interactions that
tailor to different characterized roles, target variables, collaborative
information, and so on. We validate the effectiveness of ArK on the scene
generation and editing tasks. We show that our ArK approach, combined with
large foundation models, significantly improves the quality of generated 2D/3D
scenes, compared to baselines, demonstrating the potential benefit of
incorporating ArK in generative AI for applications such as metaverse and
gaming simulation.
Related papers
- AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents [19.249596397679856]
We introduce AriGraph, a method wherein the agent constructs a memory graph that integrates semantic and episodic memories while exploring the environment.
This graph structure facilitates efficient associative retrieval of interconnected concepts, relevant to the agent's current state and goals.
We demonstrate that our Ariadne LLM agent, equipped with this proposed memory architecture augmented with planning and decision-making, effectively handles complex tasks on a zero-shot basis in the TextWorld environment.
arXiv Detail & Related papers (2024-07-05T09:06:47Z) - An Interactive Agent Foundation Model [49.77861810045509]
We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents.
Our training paradigm unifies diverse pre-training strategies, including visual masked auto-encoders, language modeling, and next-action prediction.
We demonstrate the performance of our framework across three separate domains -- Robotics, Gaming AI, and Healthcare.
arXiv Detail & Related papers (2024-02-08T18:58:02Z) - Agent AI: Surveying the Horizons of Multimodal Interaction [83.18367129924997]
"Agent AI" is a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data.
We envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment.
arXiv Detail & Related papers (2024-01-07T19:11:18Z) - REACT: Recognize Every Action Everywhere All At Once [8.10024991952397]
Group Activity Decoder (GAR) is a fundamental problem in computer vision, with diverse applications in sports analysis, surveillance, and social scene understanding.
We present REACT, an architecture inspired by the transformer encoder-decoder model.
Our method outperforms state-of-the-art GAR approaches in extensive experiments, demonstrating superior accuracy in recognizing and understanding group activities.
arXiv Detail & Related papers (2023-11-27T20:48:54Z) - Robot Skill Generalization via Keypoint Integrated Soft Actor-Critic
Gaussian Mixture Models [21.13906762261418]
A long-standing challenge for a robotic manipulation system is adapting and generalizing its acquired motor skills to unseen environments.
We tackle this challenge employing hybrid skill models that integrate imitation and reinforcement paradigms.
We show that our method enables a robot to gain a significant zero-shot generalization to novel environments and to refine skills in the target environments faster than learning from scratch.
arXiv Detail & Related papers (2023-10-23T16:03:23Z) - Knowledge-enhanced Agents for Interactive Text Games [16.055119735473017]
We propose a knowledge-injection framework for improved functional grounding of agents in text-based games.
We consider two forms of domain knowledge that we inject into learning-based agents: memory of previous correct actions and affordances of relevant objects in the environment.
Our framework supports two representative model classes: reinforcement learning agents and language model agents.
arXiv Detail & Related papers (2023-05-08T23:31:39Z) - WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model [74.4875156387271]
We develop a novel foundation model pre-trained with huge multimodal (visual and textual) data.
We show that state-of-the-art results can be obtained on a wide range of downstream tasks.
arXiv Detail & Related papers (2021-10-27T12:25:21Z) - Evaluating Continual Learning Algorithms by Generating 3D Virtual
Environments [66.83839051693695]
Continual learning refers to the ability of humans and animals to incrementally learn over time in a given environment.
We propose to leverage recent advances in 3D virtual environments in order to approach the automatic generation of potentially life-long dynamic scenes with photo-realistic appearance.
A novel element of this paper is that scenes are described in a parametric way, thus allowing the user to fully control the visual complexity of the input stream the agent perceives.
arXiv Detail & Related papers (2021-09-16T10:37:21Z) - Simultaneous Multi-View Object Recognition and Grasping in Open-Ended
Domains [0.0]
We propose a deep learning architecture with augmented memory capacities to handle open-ended object recognition and grasping simultaneously.
We demonstrate the ability of our approach to grasp never-seen-before objects and to rapidly learn new object categories using very few examples on-site in both simulation and real-world settings.
arXiv Detail & Related papers (2021-06-03T14:12:11Z) - ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation [75.0278287071591]
ThreeDWorld (TDW) is a platform for interactive multi-modal physical simulation.
TDW enables simulation of high-fidelity sensory data and physical interactions between mobile agents and objects in rich 3D environments.
We present initial experiments enabled by TDW in emerging research directions in computer vision, machine learning, and cognitive science.
arXiv Detail & Related papers (2020-07-09T17:33:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.