TDAF: Top-Down Attention Framework for Vision Tasks
- URL: http://arxiv.org/abs/2012.07248v1
- Date: Mon, 14 Dec 2020 04:19:13 GMT
- Title: TDAF: Top-Down Attention Framework for Vision Tasks
- Authors: Bo Pang, Yizhuo Li, Jiefeng Li, Muchen Li, Hanwen Cao, Cewu Lu
- Abstract summary: We propose the Top-Down Attention Framework (TDAF) to capture top-down attentions.
Empirical evidence shows that our TDF can capture effective stratified attention information and boost performance.
- Score: 46.14128665926765
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Human attention mechanisms often work in a top-down manner, yet it is not
well explored in vision research. Here, we propose the Top-Down Attention
Framework (TDAF) to capture top-down attentions, which can be easily adopted in
most existing models. The designed Recursive Dual-Directional Nested Structure
in it forms two sets of orthogonal paths, recursive and structural ones, where
bottom-up spatial features and top-down attention features are extracted
respectively. Such spatial and attention features are nested deeply, therefore,
the proposed framework works in a mixed top-down and bottom-up manner.
Empirical evidence shows that our TDAF can capture effective stratified
attention information and boost performance. ResNet with TDAF achieves 2.0%
improvements on ImageNet. For object detection, the performance is improved by
2.7% AP over FCOS. For pose estimation, TDAF improves the baseline by 1.6%. And
for action recognition, the 3D-ResNet adopting TDAF achieves improvements of
1.7% accuracy.
Related papers
- OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels [50.42092879252807]
We present OverLoCK, the first pure ConvNet backbone architecture that explicitly incorporates a top-down attention mechanism.
To fully unleash the power of top-down attention, we propose a novel context-mixing dynamic convolution (ContMix)
arXiv Detail & Related papers (2025-02-27T13:45:15Z) - Learning 1D Causal Visual Representation with De-focus Attention Networks [108.72931590504406]
This paper explores the feasibility of representing images using 1D causal modeling.
We propose De-focus Attention Networks, which employ learnable bandpass filters to create varied attention patterns.
arXiv Detail & Related papers (2024-06-06T17:59:56Z) - VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness [56.87603097348203]
VeCAF uses labels and natural language annotations to perform parametric data selection for PVM finetuning.
VeCAF incorporates the finetuning objective to select significant data points that effectively guide the PVM towards faster convergence.
On ImageNet, VeCAF uses up to 3.3x less training batches to reach the target performance compared to full finetuning.
arXiv Detail & Related papers (2024-01-15T17:28:37Z) - VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial
Attention [32.44687996180621]
We propose to adaptively fuse multi-view features in a global spatial context via Dual Cross-VIew SpaTial Attention (VISTA)
The proposed VISTA is a novel plug-and-play fusion module, wherein the multi-layer perceptron widely adopted in standard attention modules is replaced with a convolutional one.
At the time of submission, our method achieves 63.0% in overall mAP and 69.8% in NDS on the nuScenes benchmark, outperforming all published methods by up to 24% in safety-crucial categories such as cyclist.
arXiv Detail & Related papers (2022-03-18T02:34:59Z) - On Evolving Attention Towards Domain Adaptation [110.57454902557767]
This paper proposes EvoADA: a novel framework to evolve the attention configuration for a given UDA task without human intervention.
Experiments on various kinds of cross-domain benchmarks, i.e., Office-31, Office-Home, CUB-Paintings, and Duke-Market-1510, reveal that the proposed EvoADA consistently boosts multiple state-of-the-art domain adaptation approaches.
arXiv Detail & Related papers (2021-03-25T01:50:28Z) - Should I Look at the Head or the Tail? Dual-awareness Attention for
Few-Shot Object Detection [20.439719842851744]
We propose a novel Dual-Awareness-Attention (DAnA), which captures the pairwise spatial relationship cross the support and query images.
Our DAnA component is adaptable to various existing object detection networks and boosts FSOD performance by paying attention to specific semantics.
Experimental results demonstrate that DAnA significantly boosts (48% and 125% relatively) object detection performance on the COCO benchmark.
arXiv Detail & Related papers (2021-02-24T09:17:27Z) - PC-RGNN: Point Cloud Completion and Graph Neural Network for 3D Object
Detection [57.49788100647103]
LiDAR-based 3D object detection is an important task for autonomous driving.
Current approaches suffer from sparse and partial point clouds of distant and occluded objects.
In this paper, we propose a novel two-stage approach, namely PC-RGNN, dealing with such challenges by two specific solutions.
arXiv Detail & Related papers (2020-12-18T18:06:43Z) - On estimating gaze by self-attention augmented convolutions [6.015556590955813]
We propose a novel network architecture grounded on self-attention augmented convolutions to improve the quality of the learned features.
We dubbed our framework ARes-gaze, which explores our Attention-augmented ResNet (ARes-14) as twin convolutional backbones.
Results showed a decrease of the average angular error by 2.38% when compared to state-of-the-art methods on the MPIIFaceGaze data set, and a second-place on the EyeDiap data set.
arXiv Detail & Related papers (2020-08-25T14:29:05Z) - Cyclic Differentiable Architecture Search [99.12381460261841]
Differentiable ARchiTecture Search, i.e., DARTS, has drawn great attention in neural architecture search.
We propose new joint objectives and a novel Cyclic Differentiable ARchiTecture Search framework, dubbed CDARTS.
In the DARTS search space, we achieve 97.52% top-1 accuracy on CIFAR10 and 76.3% top-1 accuracy on ImageNet.
arXiv Detail & Related papers (2020-06-18T17:55:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.