ObjectFinder: Open-Vocabulary Assistive System for Interactive Object Search by Blind People
- URL: http://arxiv.org/abs/2412.03118v1
- Date: Wed, 04 Dec 2024 08:38:45 GMT
- Title: ObjectFinder: Open-Vocabulary Assistive System for Interactive Object Search by Blind People
- Authors: Ruiping Liu, Jiaming Zhang, Angela Schön, Karin Müller, Junwei Zheng, Kailun Yang, Kathrin Gerling, Rainer Stiefelhagen,
- Abstract summary: We created ObjectFinder, an open-vocabulary interactive object-search prototype.
It combines object detection with scene description and navigation.
We conducted need-finding interviews to better understand challenges in object search.
- Score: 39.57767207961938
- License:
- Abstract: Assistive technology can be leveraged by blind people when searching for objects in their daily lives. We created ObjectFinder, an open-vocabulary interactive object-search prototype, which combines object detection with scene description and navigation. It enables blind persons to detect and navigate to objects of their choice. Our approach used co-design for the development of the prototype. We further conducted need-finding interviews to better understand challenges in object search, followed by a study with the ObjectFinder prototype in a laboratory setting simulating a living room and an office, with eight blind users. Additionally, we compared the prototype with BeMyEyes and Lookout for object search. We found that most participants felt more independent with ObjectFinder and preferred it over the baselines when deployed on more efficient hardware, as it enhances mental mapping and allows for active target definition. Moreover, we identified factors for future directions for the development of object-search systems.
Related papers
- Interacted Object Grounding in Spatio-Temporal Human-Object Interactions [70.8859442754261]
We introduce a new open-world benchmark: Grounding Interacted Objects (GIO)
An object grounding task is proposed expecting vision systems to discover interacted objects.
We propose a 4D question-answering framework (4D-QA) to discover interacted objects from diverse videos.
arXiv Detail & Related papers (2024-12-27T09:08:46Z) - Semantic-Based Active Perception for Humanoid Visual Tasks with Foveal Sensors [49.99728312519117]
The aim of this work is to establish how accurately a recent semantic-based active perception model is able to complete visual tasks that are regularly performed by humans.
This model exploits the ability of current object detectors to localize and classify a large number of object classes and to update a semantic description of a scene across multiple fixations.
In the task of scene exploration, the semantic-based method demonstrates superior performance compared to the traditional saliency-based model.
arXiv Detail & Related papers (2024-04-16T18:15:57Z) - CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection [42.2847114428716]
Task driven object detection aims to detect object instances suitable for affording a task in an image.
Its challenge lies in object categories available for the task being too diverse to be limited to a closed set of object vocabulary for traditional object detection.
We propose to explore fundamental affordances rather than object categories, i.e., common attributes that enable different objects to accomplish the same task.
arXiv Detail & Related papers (2023-09-03T06:18:39Z) - DetGPT: Detect What You Need via Reasoning [33.00345609506097]
We introduce a new paradigm for object detection that we call reasoning-based object detection.
Unlike conventional object detection methods that rely on specific object names, our approach enables users to interact with the system using natural language instructions.
Our proposed method, called DetGPT, leverages state-of-the-art multi-modal models and open-vocabulary object detectors.
arXiv Detail & Related papers (2023-05-23T15:37:28Z) - Discovering a Variety of Objects in Spatio-Temporal Human-Object
Interactions [45.92485321148352]
In daily HOIs, humans often interact with a variety of objects, e.g., holding and touching dozens of household items in cleaning.
Here, we introduce a new benchmark based on AVA: Discoveringed Objects (DIO) including 51 interactions and 1,000+ objects.
An ST-HOI learning task is proposed expecting vision systems to track human actors, detect interactions and simultaneously discover objects.
arXiv Detail & Related papers (2022-11-14T16:33:54Z) - One-Shot Object Affordance Detection in the Wild [76.46484684007706]
Affordance detection refers to identifying the potential action possibilities of objects in an image.
We devise a One-Shot Affordance Detection Network (OSAD-Net) that estimates the human action purpose and then transfers it to help detect the common affordance from all candidate images.
With complex scenes and rich annotations, our PADv2 dataset can be used as a test bed to benchmark affordance detection methods.
arXiv Detail & Related papers (2021-08-08T14:53:10Z) - Detecting Human-Object Interaction via Fabricated Compositional Learning [106.37536031160282]
Human-Object Interaction (HOI) detection is a fundamental task for high-level scene understanding.
Human has extremely powerful compositional perception ability to cognize rare or unseen HOI samples.
We propose Fabricated Compositional Learning (FCL) to address the problem of open long-tailed HOI detection.
arXiv Detail & Related papers (2021-03-15T08:52:56Z) - GO-Finder: A Registration-Free Wearable System for Assisting Users in
Finding Lost Objects via Hand-Held Object Discovery [23.33413589457104]
GO-Finder is a registration-free wearable camera based system for assisting people in finding objects.
Go-Finder automatically detects and groups hand-held objects to form a visual timeline of the objects.
arXiv Detail & Related papers (2021-01-18T20:04:56Z) - Semantic Linking Maps for Active Visual Object Search [14.573513188682183]
We exploit background knowledge about common spatial relations between landmark and target objects.
We propose an active visual object search strategy method through our introduction of the Semantic Linking Maps (SLiM) model.
Based on SLiM, we describe a hybrid search strategy that selects the next best view pose for searching for the target object.
arXiv Detail & Related papers (2020-06-18T18:59:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.