Active Visual Search in the Wild
- URL: http://arxiv.org/abs/2209.08803v2
- Date: Tue, 20 Sep 2022 09:55:31 GMT
- Title: Active Visual Search in the Wild
- Authors: Jeongeun Park, Taerim Yoon, Jejoon Hong, Youngjae Yu, Matthew Pan, and
Sungjoon Choi
- Abstract summary: We propose a system where a user can enter target commands using free-form language.
We call this system Active Visual Search in the Wild (AVSW)
AVSW detects and plans to search for a target object inputted by a user through a semantic grid map represented by static landmarks.
- Score: 12.354788629408933
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we focus on the problem of efficiently locating a target
object described with free-form language using a mobile robot equipped with
vision sensors (e.g., an RGBD camera). Conventional active visual search
predefines a set of objects to search for, rendering these techniques
restrictive in practice. To provide added flexibility in active visual
searching, we propose a system where a user can enter target commands using
free-form language; we call this system Active Visual Search in the Wild
(AVSW). AVSW detects and plans to search for a target object inputted by a user
through a semantic grid map represented by static landmarks (e.g., desk or
bed). For efficient planning of object search patterns, AVSW considers
commonsense knowledge-based co-occurrence and predictive uncertainty while
deciding which landmarks to visit first. We validate the proposed method with
respect to SR (success rate) and SPL (success weighted by path length) in both
simulated and real-world environments. The proposed method outperforms previous
methods in terms of SPL in simulated scenarios with an average gap of 0.283. We
further demonstrate AVSW with a Pioneer-3AT robot in real-world studies.
Related papers
- An Application-Agnostic Automatic Target Recognition System Using Vision Language Models [32.858386851006316]
We present a novel Automatic Target Recognition (ATR) system using open-vocabulary object detection and classification models.
A primary advantage of this approach is that target classes can be defined just before runtime by a non-technical end user.
Nuances in the desired targets can be expressed in natural language, which is useful for unique targets with little or no training data.
arXiv Detail & Related papers (2024-11-05T20:16:15Z) - UAV-Based Human Body Detector Selection and Fusion for Geolocated Saliency Map Generation [0.2499907423888049]
The problem of reliably detecting and geolocating objects of different classes in soft real-time is essential in many application areas, such as Search and Rescue performed using Unmanned Aerial Vehicles (UAVs)
This research addresses the complementary problems of system contextual vision-based detector selection, allocation, and execution.
The detection results are fused using a method for building maps of salient locations which takes advantage of a novel sensor model for vision-based detections for both positive and negative observations.
arXiv Detail & Related papers (2024-08-29T13:00:37Z) - Detect2Interact: Localizing Object Key Field in Visual Question Answering (VQA) with LLMs [5.891295920078768]
We introduce an advanced approach for fine-grained object visual key field detection.
First, we use the segment anything model (SAM) to generate detailed spatial maps of objects in images.
Next, we use Vision Studio to extract semantic object descriptions.
Third, we employ GPT-4's common sense knowledge, bridging the gap between an object's semantics and its spatial map.
arXiv Detail & Related papers (2024-04-01T14:53:36Z) - Cognitive Planning for Object Goal Navigation using Generative AI Models [0.979851640406258]
We present a novel framework for solving the object goal navigation problem that generates efficient exploration strategies.
Our approach enables a robot to navigate unfamiliar environments by leveraging Large Language Models (LLMs) and Large Vision-Language Models (LVLMs)
arXiv Detail & Related papers (2024-03-30T10:54:59Z) - Deep Reinforcement Learning with Dynamic Graphs for Adaptive Informative Path Planning [22.48658555542736]
Key task in robotic data acquisition is planning paths through an initially unknown environment to collect observations.
We propose a novel deep reinforcement learning approach for adaptively replanning robot paths to map targets of interest in unknown 3D environments.
arXiv Detail & Related papers (2024-02-07T14:24:41Z) - Incremental 3D Scene Completion for Safe and Efficient Exploration
Mapping and Planning [60.599223456298915]
We propose a novel way to integrate deep learning into exploration by leveraging 3D scene completion for informed, safe, and interpretable mapping and planning.
We show that our method can speed up coverage of an environment by 73% compared to the baselines with only minimal reduction in map accuracy.
Even if scene completions are not included in the final map, we show that they can be used to guide the robot to choose more informative paths, speeding up the measurement of the scene with the robot's sensors by 35%.
arXiv Detail & Related papers (2022-08-17T14:19:33Z) - One-Shot Object Affordance Detection in the Wild [76.46484684007706]
Affordance detection refers to identifying the potential action possibilities of objects in an image.
We devise a One-Shot Affordance Detection Network (OSAD-Net) that estimates the human action purpose and then transfers it to help detect the common affordance from all candidate images.
With complex scenes and rich annotations, our PADv2 dataset can be used as a test bed to benchmark affordance detection methods.
arXiv Detail & Related papers (2021-08-08T14:53:10Z) - Rapid Exploration for Open-World Navigation with Latent Goal Models [78.45339342966196]
We describe a robotic learning system for autonomous exploration and navigation in diverse, open-world environments.
At the core of our method is a learned latent variable model of distances and actions, along with a non-parametric topological memory of images.
We use an information bottleneck to regularize the learned policy, giving us (i) a compact visual representation of goals, (ii) improved generalization capabilities, and (iii) a mechanism for sampling feasible goals for exploration.
arXiv Detail & Related papers (2021-04-12T23:14:41Z) - SOON: Scenario Oriented Object Navigation with Graph-based Exploration [102.74649829684617]
The ability to navigate like a human towards a language-guided target from anywhere in a 3D embodied environment is one of the 'holy grail' goals of intelligent robots.
Most visual navigation benchmarks focus on navigating toward a target from a fixed starting point, guided by an elaborate set of instructions that depicts step-by-step.
This approach deviates from real-world problems in which human-only describes what the object and its surrounding look like and asks the robot to start navigation from anywhere.
arXiv Detail & Related papers (2021-03-31T15:01:04Z) - ViNG: Learning Open-World Navigation with Visual Goals [82.84193221280216]
We propose a learning-based navigation system for reaching visually indicated goals.
We show that our system, which we call ViNG, outperforms previously-proposed methods for goal-conditioned reinforcement learning.
We demonstrate ViNG on a number of real-world applications, such as last-mile delivery and warehouse inspection.
arXiv Detail & Related papers (2020-12-17T18:22:32Z) - POMP: Pomcp-based Online Motion Planning for active visual search in
indoor environments [89.43830036483901]
We focus on the problem of learning an optimal policy for Active Visual Search (AVS) of objects in known indoor environments with an online setup.
Our POMP method uses as input the current pose of an agent and a RGB-D frame.
We validate our method on the publicly available AVD benchmark, achieving an average success rate of 0.76 with an average path length of 17.1.
arXiv Detail & Related papers (2020-09-17T08:23:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.