Related papers: OCTO+: A Suite for Automatic Open-Vocabulary Object Placement in Mixed Reality

OCTO+: A Suite for Automatic Open-Vocabulary Object Placement in Mixed Reality

URL: http://arxiv.org/abs/2401.08973v1
Date: Wed, 17 Jan 2024 04:52:40 GMT
Title: OCTO+: A Suite for Automatic Open-Vocabulary Object Placement in Mixed Reality
Authors: Aditya Sharma, Luke Yoffe, Tobias H\"ollerer
Abstract summary: We introduce and evaluate several methods for automatic object placement using recent advances in open-vocabulary vision-language models. We find that OCTO+ places objects in a valid region over 70% of the time, outperforming other methods on a range of metrics.
Score: 3.469644923522024
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: One key challenge in Augmented Reality is the placement of virtual content in natural locations. Most existing automated techniques can only work with a closed-vocabulary, fixed set of objects. In this paper, we introduce and evaluate several methods for automatic object placement using recent advances in open-vocabulary vision-language models. Through a multifaceted evaluation, we identify a new state-of-the-art method, OCTO+. We also introduce a benchmark for automatically evaluating the placement of virtual objects in augmented reality, alleviating the need for costly user studies. Through this, in addition to human evaluations, we find that OCTO+ places objects in a valid region over 70% of the time, outperforming other methods on a range of metrics.

Related papers

A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning [67.72413262980272]
Pre-trained vision models (PVMs) are fundamental to modern robotics, yet their optimal configuration remains unclear. We develop SlotMIM, a method that induces object-centric representations by introducing a semantic bottleneck. Our approach achieves significant improvements over prior work in image recognition, scene understanding, and robot learning evaluations.
arXiv Detail & Related papers (2025-03-10T06:18:31Z)
AirRoom: Objects Matter in Room Reidentification [4.386378218714507]
AirRoom is an object-aware pipeline that integrates multi-level object-oriented information. AirRoom outperforms state-of-the-art (SOTA) models across nearly all evaluation metrics.
arXiv Detail & Related papers (2025-03-03T03:20:08Z)
Oriented Tiny Object Detection: A Dataset, Benchmark, and Dynamic Unbiased Learning [51.170479006249195]
We introduce a new dataset, benchmark, and a dynamic coarse-to-fine learning scheme in this study. Our proposed dataset, AI-TOD-R, features the smallest object sizes among all oriented object detection datasets. We present a benchmark spanning a broad range of detection paradigms, including both fully-supervised and label-efficient approaches.
arXiv Detail & Related papers (2024-12-16T09:14:32Z)
Structured Spatial Reasoning with Open Vocabulary Object Detectors [2.089191490381739]
Reasoning about spatial relationships between objects is essential for many real-world robotic tasks. We introduce a structured probabilistic approach that integrates rich 3D geometric features with state-of-the-art open-vocabulary object detectors. The approach is evaluated and compared against zero-shot performance of the state-of-the-art Vision and Language Models (VLMs) on spatial reasoning tasks.
arXiv Detail & Related papers (2024-10-09T19:37:01Z)
Automatic Scene Generation: State-of-the-Art Techniques, Models, Datasets, Challenges, and Future Prospects [0.94371657253557]
This survey focuses on techniques that leverage machine learning, deep learning, embedded systems, and natural language processing (NLP) We categorize the models into four main types: Variational Autoencoders (VAEs), Generative Adrial Networks (GANs), Transformers, and Diffusion Models. We also review the most commonly used datasets, such as COCO-Stuff, Visual Genome, and MS-COCO, which are critical for training and evaluating these models.
arXiv Detail & Related papers (2024-09-14T19:09:10Z)
Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization. We introduce a benchmark comprising eight different synthetic and real-world datasets. We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z)
Learning Where to Look: Self-supervised Viewpoint Selection for Active Localization using Geometrical Information [68.10033984296247]
This paper explores the domain of active localization, emphasizing the importance of viewpoint selection to enhance localization accuracy. Our contributions involve using a data-driven approach with a simple architecture designed for real-time operation, a self-supervised data training method, and the capability to consistently integrate our map into a planning framework tailored for real-world robotics applications.
arXiv Detail & Related papers (2024-07-22T12:32:09Z)
AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios. We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z)
OCTOPUS: Open-vocabulary Content Tracking and Object Placement Using Semantic Understanding in Mixed Reality [3.469644923522024]
We introduce a new open-vocabulary method for object placement in augmented reality. In a preliminary user study, we show that our method performs at least as well as human experts 57% of the time.
arXiv Detail & Related papers (2023-12-20T07:34:20Z)
Open World Object Detection in the Era of Foundation Models [53.683963161370585]
We introduce a new benchmark that includes five real-world application-driven datasets. We introduce a novel method, Foundation Object detection Model for the Open world, or FOMO, which identifies unknown objects based on their shared attributes with the base known objects.
arXiv Detail & Related papers (2023-12-10T03:56:06Z)
Context-Aware Indoor Point Cloud Object Generation through User Instructions [6.398660996031915]
We present a novel end-to-end multi-modal deep neural network capable of generating point cloud objects seamlessly integrated with their surroundings. Our model revolutionizes scene modification by enabling the creation of new environments with previously unseen object layouts.
arXiv Detail & Related papers (2023-11-26T06:40:16Z)
Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models. Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning. Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z)
Lifelong Ensemble Learning based on Multiple Representations for Few-Shot Object Recognition [6.282068591820947]
We present a lifelong ensemble learning approach based on multiple representations to address the few-shot object recognition problem. To facilitate lifelong learning, each approach is equipped with a memory unit for storing and retrieving object information instantly. We have performed extensive sets of experiments to assess the performance of the proposed approach in offline, and open-ended scenarios.
arXiv Detail & Related papers (2022-05-04T10:29:10Z)
Learning Open-World Object Proposals without Learning to Classify [110.30191531975804]
We propose a classification-free Object Localization Network (OLN) which estimates the objectness of each region purely by how well the location and shape of a region overlaps with any ground-truth object. This simple strategy learns generalizable objectness and outperforms existing proposals on cross-category generalization.
arXiv Detail & Related papers (2021-08-15T14:36:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.