Learning Active Camera for Multi-Object Navigation
- URL: http://arxiv.org/abs/2210.07505v1
- Date: Fri, 14 Oct 2022 04:17:30 GMT
- Title: Learning Active Camera for Multi-Object Navigation
- Authors: Peihao Chen, Dongyu Ji, Kunyang Lin, Weiwen Hu, Wenbing Huang, Thomas
H. Li, Mingkui Tan, Chuang Gan
- Abstract summary: Getting robots to navigate to multiple objects autonomously is essential yet difficult in robot applications.
Existing navigation methods mainly focus on fixed cameras and few attempts have been made to navigate with active cameras.
In this paper, we consider navigating to multiple objects more efficiently with active cameras.
- Score: 94.89618442412247
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Getting robots to navigate to multiple objects autonomously is essential yet
difficult in robot applications. One of the key challenges is how to explore
environments efficiently with camera sensors only. Existing navigation methods
mainly focus on fixed cameras and few attempts have been made to navigate with
active cameras. As a result, the agent may take a very long time to perceive
the environment due to limited camera scope. In contrast, humans typically gain
a larger field of view by looking around for a better perception of the
environment. How to make robots perceive the environment as efficiently as
humans is a fundamental problem in robotics. In this paper, we consider
navigating to multiple objects more efficiently with active cameras.
Specifically, we cast moving camera to a Markov Decision Process and
reformulate the active camera problem as a reinforcement learning problem.
However, we have to address two new challenges: 1) how to learn a good camera
policy in complex environments and 2) how to coordinate it with the navigation
policy. To address these, we carefully design a reward function to encourage
the agent to explore more areas by moving camera actively. Moreover, we exploit
human experience to infer a rule-based camera action to guide the learning
process. Last, to better coordinate two kinds of policies, the camera policy
takes navigation actions into account when making camera moving decisions.
Experimental results show our camera policy consistently improves the
performance of multi-object navigation over four baselines on two datasets.
Related papers
- Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models [81.55156507635286]
Legged robots are physically capable of navigating a diverse variety of environments and overcoming a wide range of obstructions.
Current learning methods often struggle with generalization to the long tail of unexpected situations without heavy human supervision.
We propose a system, VLM-Predictive Control (VLM-PC), combining two key components that we find to be crucial for eliciting on-the-fly, adaptive behavior selection.
arXiv Detail & Related papers (2024-07-02T21:00:30Z) - ActFormer: Scalable Collaborative Perception via Active Queries [12.020585564801781]
Collaborative perception leverages rich visual observations from multiple robots to extend a single robot's perception ability beyond its field of view.
We present ActFormer, a Transformer that learns bird's eye view (BEV) representations by using predefined BEV queries to interact with multi-robot multi-camera inputs.
Experiments on the V2X-Sim dataset demonstrate that ActFormer improves the detection performance from 29.89% to 45.15% in terms of AP@0.7 with about 50% fewer queries.
arXiv Detail & Related papers (2024-03-08T00:45:18Z) - Polybot: Training One Policy Across Robots While Embracing Variability [70.74462430582163]
We propose a set of key design decisions to train a single policy for deployment on multiple robotic platforms.
Our framework first aligns the observation and action spaces of our policy across embodiments via utilizing wrist cameras.
We evaluate our method on a dataset collected over 60 hours spanning 6 tasks and 3 robots with varying joint configurations and sizes.
arXiv Detail & Related papers (2023-07-07T17:21:16Z) - HomeRobot: Open-Vocabulary Mobile Manipulation [107.05702777141178]
Open-Vocabulary Mobile Manipulation (OVMM) is the problem of picking any object in any unseen environment, and placing it in a commanded location.
HomeRobot has two components: a simulation component, which uses a large and diverse curated object set in new, high-quality multi-room home environments; and a real-world component, providing a software stack for the low-cost Hello Robot Stretch.
arXiv Detail & Related papers (2023-06-20T14:30:32Z) - Look Closer: Bridging Egocentric and Third-Person Views with
Transformers for Robotic Manipulation [15.632809977544907]
Learning to solve precision-based manipulation tasks from visual feedback could drastically reduce the engineering efforts required by traditional robot systems.
We propose a setting for robotic manipulation in which the agent receives visual feedback from both a third-person camera and an egocentric camera mounted on the robot's wrist.
To fuse visual information from both cameras effectively, we additionally propose to use Transformers with a cross-view attention mechanism.
arXiv Detail & Related papers (2022-01-19T18:39:03Z) - CNN-based Omnidirectional Object Detection for HermesBot Autonomous
Delivery Robot with Preliminary Frame Classification [53.56290185900837]
We propose an algorithm for optimizing a neural network for object detection using preliminary binary frame classification.
An autonomous mobile robot with 6 rolling-shutter cameras on the perimeter providing a 360-degree field of view was used as the experimental setup.
arXiv Detail & Related papers (2021-10-22T15:05:37Z) - Vision-Based Mobile Robotics Obstacle Avoidance With Deep Reinforcement
Learning [49.04274612323564]
Obstacle avoidance is a fundamental and challenging problem for autonomous navigation of mobile robots.
In this paper, we consider the problem of obstacle avoidance in simple 3D environments where the robot has to solely rely on a single monocular camera.
We tackle the obstacle avoidance problem as a data-driven end-to-end deep learning approach.
arXiv Detail & Related papers (2021-03-08T13:05:46Z) - Mobile Robot Planner with Low-cost Cameras Using Deep Reinforcement
Learning [0.0]
This study develops a robot mobility policy based on deep reinforcement learning.
In order to bring robots to market, low-cost mass production is also an issue that needs to be addressed.
arXiv Detail & Related papers (2020-12-21T07:30:04Z) - Pose-Assisted Multi-Camera Collaboration for Active Object Tracking [42.57706021569103]
Active Object Tracking (AOT) is crucial to many visionbased applications, e.g., mobile robot, intelligent surveillance.
In this paper, we extend the single-camera AOT to a multi-camera setting, where cameras tracking a target in a collaborative fashion.
We propose a novel Pose-Assisted Multi-Camera Collaboration System, which enables a camera to cooperate with the others by sharing camera poses for active object tracking.
arXiv Detail & Related papers (2020-01-15T07:49:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.