RIO: A Benchmark for Reasoning Intention-Oriented Objects in Open
Environments
- URL: http://arxiv.org/abs/2310.17290v1
- Date: Thu, 26 Oct 2023 10:15:21 GMT
- Title: RIO: A Benchmark for Reasoning Intention-Oriented Objects in Open
Environments
- Authors: Mengxue Qu, Yu Wu, Wu Liu, Xiaodan Liang, Jingkuan Song, Yao Zhao,
Yunchao Wei
- Abstract summary: We construct a comprehensive dataset called Reasoning Intention-Oriented Objects (RIO)
RIO is specifically designed to incorporate diverse real-world scenarios and a wide range of object categories.
We evaluate the ability of some existing models to reason intention-oriented objects in open environments.
- Score: 170.43912741137655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Intention-oriented object detection aims to detect desired objects based on
specific intentions or requirements. For instance, when we desire to "lie down
and rest", we instinctively seek out a suitable option such as a "bed" or a
"sofa" that can fulfill our needs. Previous work in this area is limited either
by the number of intention descriptions or by the affordance vocabulary
available for intention objects. These limitations make it challenging to
handle intentions in open environments effectively. To facilitate this
research, we construct a comprehensive dataset called Reasoning
Intention-Oriented Objects (RIO). In particular, RIO is specifically designed
to incorporate diverse real-world scenarios and a wide range of object
categories. It offers the following key features: 1) intention descriptions in
RIO are represented as natural sentences rather than a mere word or verb
phrase, making them more practical and meaningful; 2) the intention
descriptions are contextually relevant to the scene, enabling a broader range
of potential functionalities associated with the objects; 3) the dataset
comprises a total of 40,214 images and 130,585 intention-object pairs. With the
proposed RIO, we evaluate the ability of some existing models to reason
intention-oriented objects in open environments.
Related papers
- Personalized Instance-based Navigation Toward User-Specific Objects in Realistic Environments [44.6372390798904]
We propose a new task denominated Personalized Instance-based Navigation (PIN), in which an embodied agent is tasked with locating and reaching a specific personal object.
In each episode, the target object is presented to the agent using two modalities: a set of visual reference images on a neutral background and manually annotated textual descriptions.
arXiv Detail & Related papers (2024-10-23T18:01:09Z) - Go-SLAM: Grounded Object Segmentation and Localization with Gaussian Splatting SLAM [12.934788858420752]
Go-SLAM is a novel framework that utilizes 3D Gaussian Splatting SLAM to reconstruct dynamic environments.
Our system facilitates open-vocabulary querying, allowing users to locate objects using natural language descriptions.
arXiv Detail & Related papers (2024-09-25T13:56:08Z) - OpenObj: Open-Vocabulary Object-Level Neural Radiance Fields with Fine-Grained Understanding [21.64446104872021]
We introduce Open, an innovative approach to build open-vocabulary object-level Neural Fields with fine-grained understanding.
In essence, Open establishes a robust framework for efficient and watertight scene modeling and comprehension at the object-level.
The results on multiple datasets demonstrate that Open achieves superior performance in zero-shot semantic and retrieval tasks.
arXiv Detail & Related papers (2024-06-12T08:59:33Z) - Generative Region-Language Pretraining for Open-Ended Object Detection [55.42484781608621]
We propose a framework named GenerateU, which can detect dense objects and generate their names in a free-form way.
Our framework achieves comparable results to the open-vocabulary object detection method GLIP.
arXiv Detail & Related papers (2024-03-15T10:52:39Z) - Find What You Want: Learning Demand-conditioned Object Attribute Space
for Demand-driven Navigation [5.106884746419666]
The task of Visual Object Navigation (VON) involves an agent's ability to locate a particular object within a given scene.
In real-world scenarios, it is often challenging to ensure that these conditions are always met.
We propose Demand-driven Navigation (DDN), which leverages the user's demand as the task instruction.
arXiv Detail & Related papers (2023-09-15T04:07:57Z) - Cycle Consistency Driven Object Discovery [75.60399804639403]
We introduce a method that explicitly optimize the constraint that each object in a scene should be associated with a distinct slot.
By integrating these consistency objectives into various existing slot-based object-centric methods, we showcase substantial improvements in object-discovery performance.
Our results suggest that the proposed approach not only improves object discovery, but also provides richer features for downstream tasks.
arXiv Detail & Related papers (2023-06-03T21:49:06Z) - SORNet: Spatial Object-Centric Representations for Sequential
Manipulation [39.88239245446054]
Sequential manipulation tasks require a robot to perceive the state of an environment and plan a sequence of actions leading to a desired goal state.
We propose SORNet, which extracts object-centric representations from RGB images conditioned on canonical views of the objects of interest.
arXiv Detail & Related papers (2021-09-08T19:36:29Z) - Learning Open-World Object Proposals without Learning to Classify [110.30191531975804]
We propose a classification-free Object Localization Network (OLN) which estimates the objectness of each region purely by how well the location and shape of a region overlaps with any ground-truth object.
This simple strategy learns generalizable objectness and outperforms existing proposals on cross-category generalization.
arXiv Detail & Related papers (2021-08-15T14:36:02Z) - SOON: Scenario Oriented Object Navigation with Graph-based Exploration [102.74649829684617]
The ability to navigate like a human towards a language-guided target from anywhere in a 3D embodied environment is one of the 'holy grail' goals of intelligent robots.
Most visual navigation benchmarks focus on navigating toward a target from a fixed starting point, guided by an elaborate set of instructions that depicts step-by-step.
This approach deviates from real-world problems in which human-only describes what the object and its surrounding look like and asks the robot to start navigation from anywhere.
arXiv Detail & Related papers (2021-03-31T15:01:04Z) - ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to
Objects [119.46959413000594]
This document summarizes the consensus recommendations of a working group on ObjectNav.
We make recommendations on subtle but important details of evaluation criteria.
We provide a detailed description of the instantiation of these recommendations in challenges organized at the Embodied AI workshop at CVPR 2020.
arXiv Detail & Related papers (2020-06-23T17:18:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.