SpotNet: Self-Attention Multi-Task Network for Object Detection
- URL: http://arxiv.org/abs/2002.05540v2
- Date: Thu, 11 Jun 2020 14:22:49 GMT
- Title: SpotNet: Self-Attention Multi-Task Network for Object Detection
- Authors: Hughes Perreault and Guillaume-Alexandre Bilodeau and Nicolas Saunier
and Maguelonne H\'eritier
- Abstract summary: We produce foreground/background segmentation labels in a semi-supervised way, using background subtraction or optical flow.
We use those segmentation maps inside the network as a self-attention mechanism to weight the feature map used to produce the bounding boxes.
We show that by using this method, we obtain a significant mAP improvement on two traffic surveillance datasets.
- Score: 11.444576186559487
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humans are very good at directing their visual attention toward relevant
areas when they search for different types of objects. For instance, when we
search for cars, we will look at the streets, not at the top of buildings. The
motivation of this paper is to train a network to do the same via a multi-task
learning approach. To train visual attention, we produce foreground/background
segmentation labels in a semi-supervised way, using background subtraction or
optical flow. Using these labels, we train an object detection model to produce
foreground/background segmentation maps as well as bounding boxes while sharing
most model parameters. We use those segmentation maps inside the network as a
self-attention mechanism to weight the feature map used to produce the bounding
boxes, decreasing the signal of non-relevant areas. We show that by using this
method, we obtain a significant mAP improvement on two traffic surveillance
datasets, with state-of-the-art results on both UA-DETRAC and UAVDT.
Related papers
- LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion [79.22197702626542]
This paper introduces a framework that explores amodal segmentation for robotic grasping in cluttered scenes.
We propose a Linear-fusion Attention-guided Convolutional Network (LAC-Net)
The results on different datasets show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-08-06T14:50:48Z) - DisPlacing Objects: Improving Dynamic Vehicle Detection via Visual Place
Recognition under Adverse Conditions [29.828201168816243]
We investigate whether a prior map can be leveraged to aid in the detection of dynamic objects in a scene without the need for a 3D map.
We contribute an algorithm which refines an initial set of candidate object detections and produces a refined subset of highly accurate detections using a prior map.
arXiv Detail & Related papers (2023-06-30T10:46:51Z) - Improving Fine-Grained Visual Recognition in Low Data Regimes via
Self-Boosting Attention Mechanism [27.628260249895973]
Self-boosting attention mechanism (SAM) is a novel method for regularizing the network to focus on the key regions shared across samples and classes.
We develop a variant by using SAM to create multiple attention maps to pool convolutional maps in a style of bilinear pooling.
arXiv Detail & Related papers (2022-08-01T05:36:27Z) - LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of
Feature Similarity [49.84167231111667]
Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image.
We introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion.
We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations.
arXiv Detail & Related papers (2022-04-06T17:48:18Z) - Contrastive learning of Class-agnostic Activation Map for Weakly
Supervised Object Localization and Semantic Segmentation [32.76127086403596]
We propose Contrastive learning for Class-agnostic Activation Map (C$2$AM) generation using unlabeled image data.
We form the positive and negative pairs based on the above relations and force the network to disentangle foreground and background.
As the network is guided to discriminate cross-image foreground-background, the class-agnostic activation maps learned by our approach generate more complete object regions.
arXiv Detail & Related papers (2022-03-25T08:46:24Z) - Correlation-Aware Deep Tracking [83.51092789908677]
We propose a novel target-dependent feature network inspired by the self-/cross-attention scheme.
Our network deeply embeds cross-image feature correlation in multiple layers of the feature network.
Our model can be flexibly pre-trained on abundant unpaired images, leading to notably faster convergence than the existing methods.
arXiv Detail & Related papers (2022-03-03T11:53:54Z) - Graph Attention Tracking [76.19829750144564]
We propose a simple target-aware Siamese graph attention network for general object tracking.
Experiments on challenging benchmarks including GOT-10k, UAV123, OTB-100 and LaSOT demonstrate that the proposed SiamGAT outperforms many state-of-the-art trackers.
arXiv Detail & Related papers (2020-11-23T04:26:45Z) - Learning Spatio-Appearance Memory Network for High-Performance Visual
Tracking [79.80401607146987]
Existing object tracking usually learns a bounding-box based template to match visual targets across frames, which cannot accurately learn a pixel-wise representation.
This paper presents a novel segmentation-based tracking architecture, which is equipped with a local-temporal memory network to learn accurate-temporal correspondence.
arXiv Detail & Related papers (2020-09-21T08:12:02Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z) - Utilising Visual Attention Cues for Vehicle Detection and Tracking [13.2351348789193]
We explore possible ways to use visual attention (saliency) for object detection and tracking.
We propose a neural network that can simultaneously detect objects as and generate objectness and subjectness maps to save computational power.
The experiments are conducted on KITTI and DETRAC datasets.
arXiv Detail & Related papers (2020-07-31T23:00:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.