Active Visual Exploration Based on Attention-Map Entropy
- URL: http://arxiv.org/abs/2303.06457v3
- Date: Tue, 8 Aug 2023 21:00:21 GMT
- Title: Active Visual Exploration Based on Attention-Map Entropy
- Authors: Adam Pardyl, Grzegorz Rype\'s\'c, Grzegorz Kurzejamski, Bartosz
Zieli\'nski, Tomasz Trzci\'nski
- Abstract summary: We introduce a new technique called Attention-Map Entropy (AME) to determine the most informative observations.
AME does not require additional loss components, which simplifies the training.
We show that such simplified training significantly improves the performance of reconstruction, segmentation and classification on publicly available datasets.
- Score: 13.064016215754163
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Active visual exploration addresses the issue of limited sensor capabilities
in real-world scenarios, where successive observations are actively chosen
based on the environment. To tackle this problem, we introduce a new technique
called Attention-Map Entropy (AME). It leverages the internal uncertainty of
the transformer-based model to determine the most informative observations. In
contrast to existing solutions, it does not require additional loss components,
which simplifies the training. Through experiments, which also mimic
retina-like sensors, we show that such simplified training significantly
improves the performance of reconstruction, segmentation and classification on
publicly available datasets.
Related papers
- Kriformer: A Novel Spatiotemporal Kriging Approach Based on Graph Transformers [5.4381914710364665]
This study addresses posed by sparse sensor deployment and unreliable data by framing the problem as an environmental challenge.
A graphkriformer model, Kriformer, estimates data at locations without sensors by mining spatial and temporal correlations, even with limited resources.
arXiv Detail & Related papers (2024-09-23T11:01:18Z) - Localized Gaussians as Self-Attention Weights for Point Clouds Correspondence [92.07601770031236]
We investigate semantically meaningful patterns in the attention heads of an encoder-only Transformer architecture.
We find that fixing the attention weights not only accelerates the training process but also enhances the stability of the optimization.
arXiv Detail & Related papers (2024-09-20T07:41:47Z) - ACTRESS: Active Retraining for Semi-supervised Visual Grounding [52.08834188447851]
A previous study, RefTeacher, makes the first attempt to tackle this task by adopting the teacher-student framework to provide pseudo confidence supervision and attention-based supervision.
This approach is incompatible with current state-of-the-art visual grounding models, which follow the Transformer-based pipeline.
Our paper proposes the ACTive REtraining approach for Semi-Supervised Visual Grounding, abbreviated as ACTRESS.
arXiv Detail & Related papers (2024-07-03T16:33:31Z) - Zero-shot Degree of Ill-posedness Estimation for Active Small Object Change Detection [8.977792536037956]
In everyday indoor navigation, robots often needto detect non-distinctive small-change objects.
Existing techniques rely on high-quality class-specific object priors to regularize a change detector model.
In this study, we explore the concept of degree-of-ill-posedness (DoI) to improve both passive and activevision.
arXiv Detail & Related papers (2024-05-10T01:56:39Z) - Exploring Predicate Visual Context in Detecting Human-Object
Interactions [44.937383506126274]
We study how best to re-introduce image features via cross-attention.
Our model with enhanced predicate visual context (PViC) outperforms state-of-the-art methods on the HICO-DET and V-COCO benchmarks.
arXiv Detail & Related papers (2023-08-11T15:57:45Z) - Active Sensing with Predictive Coding and Uncertainty Minimization [0.0]
We present an end-to-end procedure for embodied exploration inspired by two biological computations.
We first demonstrate our approach in a maze navigation task and show that it can discover the underlying transition distributions and spatial features of the environment.
We show that our model builds unsupervised representations through exploration that allow it to efficiently categorize visual scenes.
arXiv Detail & Related papers (2023-07-02T21:14:49Z) - Adaptive Local-Component-aware Graph Convolutional Network for One-shot
Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition.
Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z) - AGO-Net: Association-Guided 3D Point Cloud Object Detection Network [86.10213302724085]
We propose a novel 3D detection framework that associates intact features for objects via domain adaptation.
We achieve new state-of-the-art performance on the KITTI 3D detection benchmark in both accuracy and speed.
arXiv Detail & Related papers (2022-08-24T16:54:38Z) - Looking Beyond Corners: Contrastive Learning of Visual Representations
for Keypoint Detection and Description Extraction [1.5749416770494706]
Learnable keypoint detectors and descriptors are beginning to outperform classical hand-crafted feature extraction methods.
Recent studies on self-supervised learning of visual representations have driven the increasing performance of learnable models based on deep networks.
We propose the Correspondence Network (CorrNet) that learns to detect repeatable keypoints and to extract discriminative descriptions.
arXiv Detail & Related papers (2021-12-22T16:27:11Z) - Cycle and Semantic Consistent Adversarial Domain Adaptation for Reducing
Simulation-to-Real Domain Shift in LiDAR Bird's Eye View [110.83289076967895]
We present a BEV domain adaptation method based on CycleGAN that uses prior semantic classification in order to preserve the information of small objects of interest during the domain adaptation process.
The quality of the generated BEVs has been evaluated using a state-of-the-art 3D object detection framework at KITTI 3D Object Detection Benchmark.
arXiv Detail & Related papers (2021-04-22T12:47:37Z) - Unsupervised Metric Relocalization Using Transform Consistency Loss [66.19479868638925]
Training networks to perform metric relocalization traditionally requires accurate image correspondences.
We propose a self-supervised solution, which exploits a key insight: localizing a query image within a map should yield the same absolute pose, regardless of the reference image used for registration.
We evaluate our framework on synthetic and real-world data, showing our approach outperforms other supervised methods when a limited amount of ground-truth information is available.
arXiv Detail & Related papers (2020-11-01T19:24:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.