Related papers: Active Perception for Tactile Sensing: A Task-Agnostic Attention-Based Approach

Active Perception for Tactile Sensing: A Task-Agnostic Attention-Based Approach

URL: http://arxiv.org/abs/2505.06182v1
Date: Fri, 09 May 2025 16:49:26 GMT
Title: Active Perception for Tactile Sensing: A Task-Agnostic Attention-Based Approach
Authors: Tim Schneider, Cristiana de Farias, Roberto Calandra, Liming Chen, Jan Peters,
Abstract summary: In robotics, active tactile perception has emerged as an important research domain.<n>This work introduces TAP (Task-agnostic Active Perception) to address the challenges posed by partially observable environments.<n>By design, TAP is completely task-agnostic and can, in principle, generalize to any active perception problem.
Score: 20.92963712967206
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Humans make extensive use of haptic exploration to map and identify the properties of the objects that we touch. In robotics, active tactile perception has emerged as an important research domain that complements vision for tasks such as object classification, shape reconstruction, and manipulation. This work introduces TAP (Task-agnostic Active Perception) -- a novel framework that leverages reinforcement learning (RL) and transformer-based architectures to address the challenges posed by partially observable environments. TAP integrates Soft Actor-Critic (SAC) and CrossQ algorithms within a unified optimization objective, jointly training a perception module and decision-making policy. By design, TAP is completely task-agnostic and can, in principle, generalize to any active perception problem. We evaluate TAP across diverse tasks, including toy examples and realistic applications involving haptic exploration of 3D models from the Tactile MNIST benchmark. Experiments demonstrate the efficacy of TAP, achieving high accuracies on the Tactile MNIST haptic digit recognition task and a tactile pose estimation task. These findings underscore the potential of TAP as a versatile and generalizable framework for advancing active tactile perception in robotics.

Related papers

Learning to See and Act: Task-Aware View Planning for Robotic Manipulation [85.65102094981802]
Task-Aware View Planning (TAVP) is a framework designed to integrate active view planning with task-specific representation learning.<n>Our proposed TAVP model achieves superior performance over state-of-the-art fixed-view approaches.
arXiv Detail & Related papers (2025-08-07T09:21:20Z)
Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO [63.140883026848286]
Active vision refers to the process of actively selecting where and how to look in order to gather task-relevant information.<n>Recently, the use of Multimodal Large Language Models (MLLMs) as central planning and decision-making modules in robotic systems has gained extensive attention.
arXiv Detail & Related papers (2025-05-27T17:29:31Z)
Emergent Active Perception and Dexterity of Simulated Humanoids from Visual Reinforcement Learning [69.71072181304066]
We introduce Perceptive Dexterous Control (PDC), a framework for vision-driven whole-body control with simulated humanoids.<n>PDC operates solely on egocentric vision for task specification, enabling object search, target placement, and skill selection through visual cues.<n>We show that training from scratch with reinforcement learning can produce emergent behaviors such as active search.
arXiv Detail & Related papers (2025-05-18T07:33:31Z)
Affordance-Guided Reinforcement Learning via Visual Prompting [51.361977466993345]
Keypoint-based Affordance Guidance for Improvements (KAGI) is a method leveraging rewards shaped by vision-language models (VLMs) for autonomous RL.<n>On real-world manipulation tasks specified by natural language descriptions, KAGI improves the sample efficiency of autonomous RL and enables successful task completion in 30K online fine-tuning steps.
arXiv Detail & Related papers (2024-07-14T21:41:29Z)
Deep Active Perception for Object Detection using Navigation Proposals [39.52573252842573]
We propose a generic supervised active perception pipeline for object detection. It can be trained using existing off-the-shelf object detectors, while also leveraging advances in simulation environments. The proposed method was evaluated on synthetic datasets, constructed within the Webots robotics simulator.
arXiv Detail & Related papers (2023-12-15T20:55:52Z)
Localizing Active Objects from Egocentric Vision with Symbolic World Knowledge [62.981429762309226]
The ability to actively ground task instructions from an egocentric view is crucial for AI agents to accomplish tasks or assist humans virtually. We propose to improve phrase grounding models' ability on localizing the active objects by: learning the role of objects undergoing change and extracting them accurately from the instructions. We evaluate our framework on Ego4D and Epic-Kitchens datasets.
arXiv Detail & Related papers (2023-10-23T16:14:05Z)
AcTExplore: Active Tactile Exploration of Unknown Objects [17.755567328263847]
We present AcTExplore, an active tactile exploration method driven by reinforcement learning for object reconstruction at scales. Our algorithm incrementally collects tactile data and reconstructs 3D shapes of the objects as well, which can serve as a representation for higher-level downstream tasks. Our method achieves an average of 95.97% IoU coverage on unseen YCB objects while just being trained on primitive shapes.
arXiv Detail & Related papers (2023-10-12T22:15:06Z)
Learning Action-Effect Dynamics for Hypothetical Vision-Language Reasoning Task [50.72283841720014]
We propose a novel learning strategy that can improve reasoning about the effects of actions. We demonstrate the effectiveness of our proposed approach and discuss its advantages over previous baselines in terms of performance, data efficiency, and generalization capability.
arXiv Detail & Related papers (2022-12-07T05:41:58Z)
Active Visual Search in the Wild [12.354788629408933]
We propose a system where a user can enter target commands using free-form language. We call this system Active Visual Search in the Wild (AVSW) AVSW detects and plans to search for a target object inputted by a user through a semantic grid map represented by static landmarks.
arXiv Detail & Related papers (2022-09-19T07:18:46Z)
Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism [120.1998866178014]
We present a flexible framework for continual object detection via pRotOtypical taSk corrElaTion guided gaTingAnism (ROSETTA) Concretely, a unified framework is shared by all tasks while task-aware gates are introduced to automatically select sub-models for specific tasks. Experiments on COCO-VOC, KITTI-Kitchen, class-incremental detection on VOC and sequential learning of four tasks show that ROSETTA yields state-of-the-art performance.
arXiv Detail & Related papers (2022-05-06T07:31:28Z)
One-Shot Object Affordance Detection in the Wild [76.46484684007706]
Affordance detection refers to identifying the potential action possibilities of objects in an image. We devise a One-Shot Affordance Detection Network (OSAD-Net) that estimates the human action purpose and then transfers it to help detect the common affordance from all candidate images. With complex scenes and rich annotations, our PADv2 dataset can be used as a test bed to benchmark affordance detection methods.
arXiv Detail & Related papers (2021-08-08T14:53:10Z)
Object-Driven Active Mapping for More Accurate Object Pose Estimation and Robotic Grasping [5.385583891213281]
The framework is built on an object SLAM system integrated with a simultaneous multi-object pose estimation process. By combining the mapping module and the exploration strategy, an accurate object map that is compatible with robotic grasping can be generated.
arXiv Detail & Related papers (2020-12-03T09:36:55Z)
Dynamic Feature Integration for Simultaneous Detection of Salient Object, Edge and Skeleton [108.01007935498104]
In this paper, we solve three low-level pixel-wise vision problems, including salient object segmentation, edge detection, and skeleton extraction. We first show some similarities shared by these tasks and then demonstrate how they can be leveraged for developing a unified framework.
arXiv Detail & Related papers (2020-04-18T11:10:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.