Related papers: Learning to View: Decision Transformers for Active Object Detection

Learning to View: Decision Transformers for Active Object Detection

URL: http://arxiv.org/abs/2301.09544v1
Date: Mon, 23 Jan 2023 17:00:48 GMT
Title: Learning to View: Decision Transformers for Active Object Detection
Authors: Wenhao Ding, Nathalie Majcherczyk, Mohit Deshpande, Xuewei Qi, Ding Zhao, Rajasimman Madhivanan, Arnie Sen
Abstract summary: In most robotic systems, perception is typically independent of motion planning. We use reinforcement learning (RL) methods to control the robot in order to obtain images that maximize the detection quality. We evaluate the performance of proposed method on an interactive dataset collected from an indoor scenario simulator.
Score: 18.211691238072245
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Active perception describes a broad class of techniques that couple planning and perception systems to move the robot in a way to give the robot more information about the environment. In most robotic systems, perception is typically independent of motion planning. For example, traditional object detection is passive: it operates only on the images it receives. However, we have a chance to improve the results if we allow planning to consume detection signals and move the robot to collect views that maximize the quality of the results. In this paper, we use reinforcement learning (RL) methods to control the robot in order to obtain images that maximize the detection quality. Specifically, we propose using a Decision Transformer with online fine-tuning, which first optimizes the policy with a pre-collected expert dataset and then improves the learned policy by exploring better solutions in the environment. We evaluate the performance of proposed method on an interactive dataset collected from an indoor scenario simulator. Experimental results demonstrate that our method outperforms all baselines, including expert policy and pure offline RL methods. We also provide exhaustive analyses of the reward distribution and observation space.

Related papers

AntiGrounding: Lifting Robotic Actions into VLM Representation Space for Decision Making [35.83717913117858]
AntiGrounding is a new framework that reverses the instruction grounding process.<n>It lifts candidate actions directly into the VLM representation space.<n>It renders trajectories from multiple views, and uses structured visual question answering for instruction-based decision making.
arXiv Detail & Related papers (2025-06-14T07:11:44Z)
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors [59.31993241876335]
In this work, we explore grounding masks as an effective intermediate representation. We introduce RoboGround, a grounding-aware robotic manipulation system. To further explore and enhance generalization, we propose an automated pipeline for generating large-scale, simulated data.
arXiv Detail & Related papers (2025-04-30T11:26:40Z)
Next-Best-Trajectory Planning of Robot Manipulators for Effective Observation and Exploration [0.26999000177990923]
Next-Best-Trajectory principle is developed for a robot manipulator operating in dynamic environments. We employ a voxel map for environment modeling and utilize raycasting from perspectives around a point of interest to estimate the information gain. A global ergodic trajectory planner provides an optional reference trajectory to the local planner, improving exploration and helping to avoid local minima.
arXiv Detail & Related papers (2025-03-28T16:34:29Z)
Learning Manipulation by Predicting Interaction [85.57297574510507]
We propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction. The experimental results demonstrate that MPI exhibits remarkable improvement by 10% to 64% compared with previous state-of-the-art in real-world robot platforms.
arXiv Detail & Related papers (2024-06-01T13:28:31Z)
Learning active tactile perception through belief-space control [21.708391958446274]
We propose a method that autonomously learns tactile exploration policies by developing a generative world model. We evaluate our method on three simulated tasks where the goal is to estimate a desired object property. We find that our method is able to discover policies that efficiently gather information about the desired property in an intuitive manner.
arXiv Detail & Related papers (2023-11-30T21:54:42Z)
Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning. Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy. Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z)
Robust Visual Sim-to-Real Transfer for Robotic Manipulation [79.66851068682779]
Learning visuomotor policies in simulation is much safer and cheaper than in the real world. However, due to discrepancies between the simulated and real data, simulator-trained policies often fail when transferred to real robots. One common approach to bridge the visual sim-to-real domain gap is domain randomization (DR)
arXiv Detail & Related papers (2023-07-28T05:47:24Z)
Polybot: Training One Policy Across Robots While Embracing Variability [70.74462430582163]
We propose a set of key design decisions to train a single policy for deployment on multiple robotic platforms. Our framework first aligns the observation and action spaces of our policy across embodiments via utilizing wrist cameras. We evaluate our method on a dataset collected over 60 hours spanning 6 tasks and 3 robots with varying joint configurations and sizes.
arXiv Detail & Related papers (2023-07-07T17:21:16Z)
Active Exploration for Robotic Manipulation [40.39182660794481]
This paper proposes a model-based active exploration approach that enables efficient learning in sparse-reward robotic manipulation tasks. We evaluate our proposed algorithm in simulation and on a real robot, trained from scratch with our method.
arXiv Detail & Related papers (2022-10-23T18:07:51Z)
Verifying Learning-Based Robotic Navigation Systems [61.01217374879221]
We show how modern verification engines can be used for effective model selection. Specifically, we use verification to detect and rule out policies that may demonstrate suboptimal behavior. Our work is the first to demonstrate the use of verification backends for recognizing suboptimal DRL policies in real-world robots.
arXiv Detail & Related papers (2022-05-26T17:56:43Z)
Affordance Learning from Play for Sample-Efficient Policy Learning [30.701546777177555]
We use a self-supervised visual affordance model from human teleoperated play data to enable efficient policy learning and motion planning. We combine model-based planning with model-free deep reinforcement learning to learn policies that favor the same object regions favored by people. We find that our policies train 4x faster than the baselines and generalize better to novel objects because our visual affordance model can anticipate their affordance regions.
arXiv Detail & Related papers (2022-03-01T11:00:35Z)
Adaptive Informative Path Planning Using Deep Reinforcement Learning for UAV-based Active Sensing [2.6519061087638014]
We propose a new approach for informative path planning based on deep reinforcement learning (RL) Our method combines Monte Carlo tree search with an offline-learned neural network predicting informative sensing actions. By deploying the trained network during a mission, our method enables sample-efficient online replanning on physical platforms with limited computational resources.
arXiv Detail & Related papers (2021-09-28T09:00:55Z)
A Framework for Efficient Robotic Manipulation [79.10407063260473]
We show that a single robotic arm can learn sparse-reward manipulation policies from pixels. We show that, given only 10 demonstrations, a single robotic arm can learn sparse-reward manipulation policies from pixels.
arXiv Detail & Related papers (2020-12-14T22:18:39Z)
Real-Time Object Detection and Recognition on Low-Compute Humanoid Robots using Deep Learning [0.12599533416395764]
We describe a novel architecture that enables multiple low-compute NAO robots to perform real-time detection, recognition and localization of objects in its camera view. The proposed algorithm for object detection and localization is an empirical modification of YOLOv3, based on indoor experiments in multiple scenarios. The architecture also comprises of an effective end-to-end pipeline to feed the real-time frames from the camera feed to the neural net and use its results for guiding the robot.
arXiv Detail & Related papers (2020-01-20T05:24:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.