Augmented Object Intelligence with XR-Objects
- URL: http://arxiv.org/abs/2404.13274v3
- Date: Tue, 6 Aug 2024 07:55:44 GMT
- Title: Augmented Object Intelligence with XR-Objects
- Authors: Mustafa Doga Dogan, Eric J. Gonzalez, Karan Ahuja, Ruofei Du, Andrea Colaço, Johnny Lee, Mar Gonzalez-Franco, David Kim,
- Abstract summary: This paper explores Artificial Object Intelligence in the context of XR, an interaction paradigm that aims to blur the lines between digital and physical.
We implement the AOI concept in the form of XR-Objects, an open-source prototype system.
This system enables analog objects to not only convey information but also to initiate digital actions, such as querying for details or executing tasks.
- Score: 18.574032913387573
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Seamless integration of physical objects as interactive digital entities remains a challenge for spatial computing. This paper explores Artificial Object Intelligence (AOI) in the context of XR, an interaction paradigm that aims to blur the lines between digital and physical by equipping real-world objects with the ability to interact as if they were digital, where every object has the potential to serve as a portal to digital functionalities. Our approach utilizes real-time object segmentation and classification, combined with the power of Multimodal Large Language Models (MLLMs), to facilitate these interactions without the need for object pre-registration. We implement the AOI concept in the form of XR-Objects, an open-source prototype system that provides a platform for users to engage with their physical environment in contextually relevant ways using object-based context menus. This system enables analog objects to not only convey information but also to initiate digital actions, such as querying for details or executing tasks. Our contributions are threefold: (1) we define the AOI concept and detail its advantages over traditional AI assistants, (2) detail the XR-Objects system's open-source design and implementation, and (3) show its versatility through various use cases and a user study.
Related papers
- Articulate-Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model [35.184607650708784]
Articulate-Anything automates the articulation of diverse, complex objects from many input modalities, including text, images, and videos.
Our system exploits existing 3D asset datasets via a mesh retrieval mechanism, along with an actor-critic system that iteratively proposes, evaluates, and refines solutions.
arXiv Detail & Related papers (2024-10-03T19:42:16Z) - Weak-to-Strong 3D Object Detection with X-Ray Distillation [75.47580744933724]
We propose a versatile technique that seamlessly integrates into any existing framework for 3D Object Detection.
X-Ray Distillation with Object-Complete Frames is suitable for both supervised and semi-supervised settings.
Our proposed methods surpass state-of-the-art in semi-supervised learning by 1-1.5 mAP.
arXiv Detail & Related papers (2024-03-31T13:09:06Z) - ROAM: Robust and Object-Aware Motion Generation Using Neural Pose
Descriptors [73.26004792375556]
This paper shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object.
We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object.
We demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects.
arXiv Detail & Related papers (2023-08-24T17:59:51Z) - Towards a conceptual model for the FAIR Digital Object Framework [0.0]
The FAIR Digital Objects movement aims at an infrastructure where digital objects can be exposed and explored according to the FAIR principles.
The conceptual model covers aspects of digital objects that are relevant to the FAIR principles.
arXiv Detail & Related papers (2023-02-23T10:00:46Z) - Complex-Valued Autoencoders for Object Discovery [62.26260974933819]
We propose a distributed approach to object-centric representations: the Complex AutoEncoder.
We show that this simple and efficient approach achieves better reconstruction performance than an equivalent real-valued autoencoder on simple multi-object datasets.
We also show that it achieves competitive unsupervised object discovery performance to a SlotAttention model on two datasets, and manages to disentangle objects in a third dataset where SlotAttention fails - all while being 7-70 times faster to train.
arXiv Detail & Related papers (2022-04-05T09:25:28Z) - ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and
Tactile Representations [52.226947570070784]
We present Object, a dataset of 100 objects that addresses both challenges with two key innovations.
First, Object encodes the visual, auditory, and tactile sensory data for all objects, enabling a number of multisensory object recognition tasks.
Second, Object employs a uniform, object-centric simulations, and implicit representation for each object's visual textures, tactile readings, and tactile readings, making the dataset flexible to use and easy to share.
arXiv Detail & Related papers (2021-09-16T14:00:59Z) - O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance
Learning [24.9242853417825]
We propose a unified affordance learning framework to learn object-object interaction for various tasks.
We are able to conduct large-scale object-object affordance learning without the need for human annotations or demonstrations.
Experiments on large-scale synthetic data and real-world data prove the effectiveness of the proposed approach.
arXiv Detail & Related papers (2021-06-29T04:38:12Z) - Where2Act: From Pixels to Actions for Articulated 3D Objects [54.19638599501286]
We extract highly localized actionable information related to elementary actions such as pushing or pulling for articulated objects with movable parts.
We propose a learning-from-interaction framework with an online data sampling strategy that allows us to train the network in simulation.
Our learned models even transfer to real-world data.
arXiv Detail & Related papers (2021-01-07T18:56:38Z) - A Deep Learning Approach to Object Affordance Segmentation [31.221897360610114]
We design an autoencoder that infers pixel-wise affordance labels in both videos and static images.
Our model surpasses the need for object labels and bounding boxes by using a soft-attention mechanism.
We show that our model achieves competitive results compared to strongly supervised methods on SOR3D-AFF.
arXiv Detail & Related papers (2020-04-18T15:34:41Z) - Look-into-Object: Self-supervised Structure Modeling for Object
Recognition [71.68524003173219]
We propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions.
We show the recognition backbone can be substantially enhanced for more robust representation learning.
Our approach achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft)
arXiv Detail & Related papers (2020-03-31T12:22:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.