OCTO+: A Suite for Automatic Open-Vocabulary Object Placement in Mixed
Reality
- URL: http://arxiv.org/abs/2401.08973v1
- Date: Wed, 17 Jan 2024 04:52:40 GMT
- Title: OCTO+: A Suite for Automatic Open-Vocabulary Object Placement in Mixed
Reality
- Authors: Aditya Sharma, Luke Yoffe, Tobias H\"ollerer
- Abstract summary: We introduce and evaluate several methods for automatic object placement using recent advances in open-vocabulary vision-language models.
We find that OCTO+ places objects in a valid region over 70% of the time, outperforming other methods on a range of metrics.
- Score: 3.469644923522024
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One key challenge in Augmented Reality is the placement of virtual content in
natural locations. Most existing automated techniques can only work with a
closed-vocabulary, fixed set of objects. In this paper, we introduce and
evaluate several methods for automatic object placement using recent advances
in open-vocabulary vision-language models. Through a multifaceted evaluation,
we identify a new state-of-the-art method, OCTO+. We also introduce a benchmark
for automatically evaluating the placement of virtual objects in augmented
reality, alleviating the need for costly user studies. Through this, in
addition to human evaluations, we find that OCTO+ places objects in a valid
region over 70% of the time, outperforming other methods on a range of metrics.
Related papers
- Learning Where to Look: Self-supervised Viewpoint Selection for Active Localization using Geometrical Information [68.10033984296247]
This paper explores the domain of active localization, emphasizing the importance of viewpoint selection to enhance localization accuracy.
Our contributions involve using a data-driven approach with a simple architecture designed for real-time operation, a self-supervised data training method, and the capability to consistently integrate our map into a planning framework tailored for real-world robotics applications.
arXiv Detail & Related papers (2024-07-22T12:32:09Z) - AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios.
We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z) - OCTOPUS: Open-vocabulary Content Tracking and Object Placement Using
Semantic Understanding in Mixed Reality [3.469644923522024]
We introduce a new open-vocabulary method for object placement in augmented reality.
In a preliminary user study, we show that our method performs at least as well as human experts 57% of the time.
arXiv Detail & Related papers (2023-12-20T07:34:20Z) - Open World Object Detection in the Era of Foundation Models [53.683963161370585]
We introduce a new benchmark that includes five real-world application-driven datasets.
We introduce a novel method, Foundation Object detection Model for the Open world, or FOMO, which identifies unknown objects based on their shared attributes with the base known objects.
arXiv Detail & Related papers (2023-12-10T03:56:06Z) - Context-Aware Indoor Point Cloud Object Generation through User Instructions [6.398660996031915]
We present a novel end-to-end multi-modal deep neural network capable of generating point cloud objects seamlessly integrated with their surroundings.
Our model revolutionizes scene modification by enabling the creation of new environments with previously unseen object layouts.
arXiv Detail & Related papers (2023-11-26T06:40:16Z) - Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.
Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.
Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z) - AirLoc: Object-based Indoor Relocalization [8.88390498722337]
We propose a simple yet effective object-based indoor relocalization approach, dubbed AirLoc.
To overcome the challenges of object reidentification and remembering object relationships, we extract object-wise appearance embedding and inter-object geometric relationships.
This results in a robust, accurate, and portable indoor relocalization system, which outperforms the state-of-the-art methods in room-level relocalization by 9.5% of PR-AUC and 7% of accuracy.
arXiv Detail & Related papers (2023-04-03T13:16:47Z) - Lifelong Ensemble Learning based on Multiple Representations for
Few-Shot Object Recognition [6.282068591820947]
We present a lifelong ensemble learning approach based on multiple representations to address the few-shot object recognition problem.
To facilitate lifelong learning, each approach is equipped with a memory unit for storing and retrieving object information instantly.
We have performed extensive sets of experiments to assess the performance of the proposed approach in offline, and open-ended scenarios.
arXiv Detail & Related papers (2022-05-04T10:29:10Z) - Contrastive Learning for Cross-Domain Open World Recognition [17.660958043781154]
The ability to evolve is fundamental for any valuable autonomous agent whose knowledge cannot remain limited to that injected by the manufacturer.
We show how it learns a feature space perfectly suitable to incrementally include new classes and is able to capture knowledge which generalizes across a variety of visual domains.
Our method is endowed with a tailored effective stopping criterion for each learning episode and exploits a novel self-paced thresholding strategy.
arXiv Detail & Related papers (2022-03-17T11:23:53Z) - ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and
Tactile Representations [52.226947570070784]
We present Object, a dataset of 100 objects that addresses both challenges with two key innovations.
First, Object encodes the visual, auditory, and tactile sensory data for all objects, enabling a number of multisensory object recognition tasks.
Second, Object employs a uniform, object-centric simulations, and implicit representation for each object's visual textures, tactile readings, and tactile readings, making the dataset flexible to use and easy to share.
arXiv Detail & Related papers (2021-09-16T14:00:59Z) - Learning Open-World Object Proposals without Learning to Classify [110.30191531975804]
We propose a classification-free Object Localization Network (OLN) which estimates the objectness of each region purely by how well the location and shape of a region overlaps with any ground-truth object.
This simple strategy learns generalizable objectness and outperforms existing proposals on cross-category generalization.
arXiv Detail & Related papers (2021-08-15T14:36:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.