Apple: Toward General Active Perception via Reinforcement Learning
- URL: http://arxiv.org/abs/2505.06182v3
- Date: Tue, 30 Sep 2025 16:27:14 GMT
- Title: Apple: Toward General Active Perception via Reinforcement Learning
- Authors: Tim Schneider, Cristiana de Farias, Roberto Calandra, Liming Chen, Jan Peters,
- Abstract summary: APPLE (Active Perception Policy Learning) is a novel framework to address a range of different active perception problems.<n>By design, APPLE is not limited to a specific task and can, in principle, be applied to a wide range of active perception problems.<n> Experiments demonstrate the efficacy of APPLE, achieving high accuracies on both regression and classification tasks.
- Score: 17.92494758004686
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Active perception is a fundamental skill that enables us humans to deal with uncertainty in our inherently partially observable environment. For senses such as touch, where the information is sparse and local, active perception becomes crucial. In recent years, active perception has emerged as an important research domain in robotics. However, current methods are often bound to specific tasks or make strong assumptions, which limit their generality. To address this gap, this work introduces APPLE (Active Perception Policy Learning) - a novel framework that leverages reinforcement learning (RL) to address a range of different active perception problems. APPLE jointly trains a transformer-based perception module and decision-making policy with a unified optimization objective, learning how to actively gather information. By design, APPLE is not limited to a specific task and can, in principle, be applied to a wide range of active perception problems. We evaluate two variants of APPLE across different tasks, including tactile exploration problems from the Tactile MNIST benchmark. Experiments demonstrate the efficacy of APPLE, achieving high accuracies on both regression and classification tasks. These findings underscore the potential of APPLE as a versatile and general framework for advancing active perception in robotics.
Related papers
- Real-World Reinforcement Learning of Active Perception Behaviors [27.56548234738969]
A robot's instantaneous sensory observations do not always reveal task-relevant state information.<n>We propose a simple real-world robot learning recipe to efficiently train active perception policies.<n>Our approach, asymmetric advantage weighted regression, exploits access to "privileged" extra sensors at training time.
arXiv Detail & Related papers (2025-12-01T02:05:20Z) - Learning to See and Act: Task-Aware View Planning for Robotic Manipulation [85.65102094981802]
Task-Aware View Planning (TAVP) is a framework designed to integrate active view planning with task-specific representation learning.<n>Our proposed TAVP model achieves superior performance over state-of-the-art fixed-view approaches.
arXiv Detail & Related papers (2025-08-07T09:21:20Z) - Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO [63.140883026848286]
Active vision refers to the process of actively selecting where and how to look in order to gather task-relevant information.<n>Recently, the use of Multimodal Large Language Models (MLLMs) as central planning and decision-making modules in robotic systems has gained extensive attention.
arXiv Detail & Related papers (2025-05-27T17:29:31Z) - Emergent Active Perception and Dexterity of Simulated Humanoids from Visual Reinforcement Learning [69.71072181304066]
We introduce Perceptive Dexterous Control (PDC), a framework for vision-driven whole-body control with simulated humanoids.<n>PDC operates solely on egocentric vision for task specification, enabling object search, target placement, and skill selection through visual cues.<n>We show that training from scratch with reinforcement learning can produce emergent behaviors such as active search.
arXiv Detail & Related papers (2025-05-18T07:33:31Z) - Affordance-Guided Reinforcement Learning via Visual Prompting [51.361977466993345]
Keypoint-based Affordance Guidance for Improvements (KAGI) is a method leveraging rewards shaped by vision-language models (VLMs) for autonomous RL.<n>On real-world manipulation tasks specified by natural language descriptions, KAGI improves the sample efficiency of autonomous RL and enables successful task completion in 30K online fine-tuning steps.
arXiv Detail & Related papers (2024-07-14T21:41:29Z) - The impact of Compositionality in Zero-shot Multi-label action recognition for Object-based tasks [4.971065912401385]
We propose Dual-VCLIP, a unified approach for zero-shot multi-label action recognition.
Dual-VCLIP enhances VCLIP, a zero-shot action recognition method, with the DualCoOp method for multi-label image classification.
We validate our method on the Charades dataset that includes a majority of object-based actions.
arXiv Detail & Related papers (2024-05-14T15:28:48Z) - Deep Active Perception for Object Detection using Navigation Proposals [39.52573252842573]
We propose a generic supervised active perception pipeline for object detection.
It can be trained using existing off-the-shelf object detectors, while also leveraging advances in simulation environments.
The proposed method was evaluated on synthetic datasets, constructed within the Webots robotics simulator.
arXiv Detail & Related papers (2023-12-15T20:55:52Z) - Localizing Active Objects from Egocentric Vision with Symbolic World
Knowledge [62.981429762309226]
The ability to actively ground task instructions from an egocentric view is crucial for AI agents to accomplish tasks or assist humans virtually.
We propose to improve phrase grounding models' ability on localizing the active objects by: learning the role of objects undergoing change and extracting them accurately from the instructions.
We evaluate our framework on Ego4D and Epic-Kitchens datasets.
arXiv Detail & Related papers (2023-10-23T16:14:05Z) - AcTExplore: Active Tactile Exploration of Unknown Objects [17.755567328263847]
We present AcTExplore, an active tactile exploration method driven by reinforcement learning for object reconstruction at scales.
Our algorithm incrementally collects tactile data and reconstructs 3D shapes of the objects as well, which can serve as a representation for higher-level downstream tasks.
Our method achieves an average of 95.97% IoU coverage on unseen YCB objects while just being trained on primitive shapes.
arXiv Detail & Related papers (2023-10-12T22:15:06Z) - Latent Exploration for Reinforcement Learning [87.42776741119653]
In Reinforcement Learning, agents learn policies by exploring and interacting with the environment.
We propose LATent TIme-Correlated Exploration (Lattice), a method to inject temporally-correlated noise into the latent state of the policy network.
arXiv Detail & Related papers (2023-05-31T17:40:43Z) - Learning Action-Effect Dynamics for Hypothetical Vision-Language
Reasoning Task [50.72283841720014]
We propose a novel learning strategy that can improve reasoning about the effects of actions.
We demonstrate the effectiveness of our proposed approach and discuss its advantages over previous baselines in terms of performance, data efficiency, and generalization capability.
arXiv Detail & Related papers (2022-12-07T05:41:58Z) - Active Visual Search in the Wild [12.354788629408933]
We propose a system where a user can enter target commands using free-form language.
We call this system Active Visual Search in the Wild (AVSW)
AVSW detects and plans to search for a target object inputted by a user through a semantic grid map represented by static landmarks.
arXiv Detail & Related papers (2022-09-19T07:18:46Z) - Denoised MDPs: Learning World Models Better Than the World Itself [94.74665254213588]
This work categorizes information out in the wild into four types based on controllability and relation with reward, and formulates useful information as that which is both controllable and reward-relevant.
Experiments on variants of DeepMind Control Suite and RoboDesk demonstrate superior performance of our denoised world model over using raw observations alone.
arXiv Detail & Related papers (2022-06-30T17:59:49Z) - Continual Object Detection via Prototypical Task Correlation Guided
Gating Mechanism [120.1998866178014]
We present a flexible framework for continual object detection via pRotOtypical taSk corrElaTion guided gaTingAnism (ROSETTA)
Concretely, a unified framework is shared by all tasks while task-aware gates are introduced to automatically select sub-models for specific tasks.
Experiments on COCO-VOC, KITTI-Kitchen, class-incremental detection on VOC and sequential learning of four tasks show that ROSETTA yields state-of-the-art performance.
arXiv Detail & Related papers (2022-05-06T07:31:28Z) - Contrastive Learning for Cross-Domain Open World Recognition [17.660958043781154]
The ability to evolve is fundamental for any valuable autonomous agent whose knowledge cannot remain limited to that injected by the manufacturer.
We show how it learns a feature space perfectly suitable to incrementally include new classes and is able to capture knowledge which generalizes across a variety of visual domains.
Our method is endowed with a tailored effective stopping criterion for each learning episode and exploits a novel self-paced thresholding strategy.
arXiv Detail & Related papers (2022-03-17T11:23:53Z) - Improving Object Permanence using Agent Actions and Reasoning [8.847502932609737]
Existing approaches learn object permanence from low-level perception.
We argue that object permanence can be improved when the robot uses knowledge about executed actions.
arXiv Detail & Related papers (2021-10-01T07:09:49Z) - One-Shot Object Affordance Detection in the Wild [76.46484684007706]
Affordance detection refers to identifying the potential action possibilities of objects in an image.
We devise a One-Shot Affordance Detection Network (OSAD-Net) that estimates the human action purpose and then transfers it to help detect the common affordance from all candidate images.
With complex scenes and rich annotations, our PADv2 dataset can be used as a test bed to benchmark affordance detection methods.
arXiv Detail & Related papers (2021-08-08T14:53:10Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Object-Driven Active Mapping for More Accurate Object Pose Estimation
and Robotic Grasping [5.385583891213281]
The framework is built on an object SLAM system integrated with a simultaneous multi-object pose estimation process.
By combining the mapping module and the exploration strategy, an accurate object map that is compatible with robotic grasping can be generated.
arXiv Detail & Related papers (2020-12-03T09:36:55Z) - Dynamic Feature Integration for Simultaneous Detection of Salient
Object, Edge and Skeleton [108.01007935498104]
In this paper, we solve three low-level pixel-wise vision problems, including salient object segmentation, edge detection, and skeleton extraction.
We first show some similarities shared by these tasks and then demonstrate how they can be leveraged for developing a unified framework.
arXiv Detail & Related papers (2020-04-18T11:10:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.