Coarse-to-fine Q-attention with Tree Expansion
- URL: http://arxiv.org/abs/2204.12471v1
- Date: Tue, 26 Apr 2022 17:41:28 GMT
- Title: Coarse-to-fine Q-attention with Tree Expansion
- Authors: Stephen James and Pieter Abbeel
- Abstract summary: Coarse-to-fine Q-attention enables sample-efficient robot manipulation by discretizing the translation space in a coarse-to-fine manner.
Q-attention suffers from "coarse ambiguity" - when voxelization is significantly coarse, it is not feasible to distinguish similar-looking objects without first inspecting at a finer resolution.
We propose to envision Q-attention as a tree that can be expanded and used to accumulate value estimates across the top-k voxels at each Q-attention depth.
- Score: 95.00518278458908
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Coarse-to-fine Q-attention enables sample-efficient robot manipulation by
discretizing the translation space in a coarse-to-fine manner, where the
resolution gradually increases at each layer in the hierarchy. Although
effective, Q-attention suffers from "coarse ambiguity" - when voxelization is
significantly coarse, it is not feasible to distinguish similar-looking objects
without first inspecting at a finer resolution. To combat this, we propose to
envision Q-attention as a tree that can be expanded and used to accumulate
value estimates across the top-k voxels at each Q-attention depth. When our
extension, Q-attention with Tree Expansion (QTE), replaces standard Q-attention
in the Attention-driven Robot Manipulation (ARM) system, we are able to
accomplish a larger set of tasks; especially on those that suffer from "coarse
ambiguity". In addition to evaluating our approach across 12 RLBench tasks, we
also show that the improved performance is visible in a real-world task
involving small objects.
Related papers
- Adaptive Feature Selection for No-Reference Image Quality Assessment by Mitigating Semantic Noise Sensitivity [55.399230250413986]
We propose a Quality-Aware Feature Matching IQA Metric (QFM-IQM) to remove harmful semantic noise features from the upstream task.
Our approach achieves superior performance to the state-of-the-art NR-IQA methods on eight standard IQA datasets.
arXiv Detail & Related papers (2023-12-11T06:50:27Z) - Top-Down Visual Attention from Analysis by Synthesis [87.47527557366593]
We consider top-down attention from a classic Analysis-by-Synthesis (AbS) perspective of vision.
We propose Analysis-by-Synthesis Vision Transformer (AbSViT), which is a top-down modulated ViT model that variationally approximates AbS, and controllable achieves top-down attention.
arXiv Detail & Related papers (2023-03-23T05:17:05Z) - From Pixels to Objects: Cubic Visual Attention for Visual Question
Answering [132.95819467484517]
Recently, attention-based Visual Question Answering (VQA) has achieved great success by utilizing question to target different visual areas that are related to the answer.
We propose a Cubic Visual Attention (CVA) model by successfully applying a novel channel and spatial attention on object regions to improve VQA task.
Experimental results show that our proposed method significantly outperforms the state-of-the-arts.
arXiv Detail & Related papers (2022-06-04T07:03:18Z) - HAN: Higher-order Attention Network for Spoken Language Understanding [31.326152465734747]
We propose to replace the conventional attention with our proposed Bilinear attention block.
We conduct wide analysis to explore the effectiveness brought from the higher-order attention.
arXiv Detail & Related papers (2021-08-26T17:13:08Z) - Efficient measure for the expressivity of variational quantum algorithms [72.59790225766777]
We exploit an advanced tool in statistical learning theory, i.e., covering number, to study the expressivity of variational quantum algorithms.
We first exhibit how the expressivity of VQAs with an arbitrary ansatze is upper bounded by the number of quantum gates and the measurement observable.
We then explore the expressivity of VQAs on near-term quantum chips, where the system noise is considered.
arXiv Detail & Related papers (2021-04-20T13:51:08Z) - Capturing Multi-Resolution Context by Dilated Self-Attention [58.69803243323346]
We propose a combination of restricted self-attention and a dilation mechanism, which we refer to as dilated self-attention.
The restricted self-attention allows attention to neighboring frames of the query at a high resolution, and the dilation mechanism summarizes distant information to allow attending to it with a lower resolution.
ASR results demonstrate substantial improvements compared to restricted self-attention alone, achieving similar results compared to full-sequence based self-attention with a fraction of the computational costs.
arXiv Detail & Related papers (2021-04-07T02:04:18Z) - Attention or memory? Neurointerpretable agents in space and time [0.0]
We design a model incorporating a self-attention mechanism that implements task-state representations in semantic feature-space.
To evaluate the agent's selective properties, we add a large volume of task-irrelevant features to observations.
In line with neuroscience predictions, self-attention leads to increased robustness to noise compared to benchmark models.
arXiv Detail & Related papers (2020-07-09T15:04:26Z) - Neuroevolution of Self-Interpretable Agents [11.171154483167514]
Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight.
Motivated by selective attention, we study the properties of artificial agents that perceive the world through the lens of a self-attention bottleneck.
arXiv Detail & Related papers (2020-03-18T11:40:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.