Related papers: WW-Nets: Dual Neural Networks for Object Detection

WW-Nets: Dual Neural Networks for Object Detection

URL: http://arxiv.org/abs/2005.07787v1
Date: Fri, 15 May 2020 21:16:22 GMT
Title: WW-Nets: Dual Neural Networks for Object Detection
Authors: Mohammad K. Ebrahimpour, J. Ben Falandays, Samuel Spevack, Ming-Hsuan Yang, and David C. Noelle
Abstract summary: We propose a new deep convolutional neural network framework that uses object location knowledge implicit in network connection weights to guide selective attention in object detection tasks. Our approach is called What-Where Nets (WW-Nets), and it is inspired by the structure of human visual pathways.
Score: 48.67090730174743
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a new deep convolutional neural network framework that uses object location knowledge implicit in network connection weights to guide selective attention in object detection tasks. Our approach is called What-Where Nets (WW-Nets), and it is inspired by the structure of human visual pathways. In the brain, vision incorporates two separate streams, one in the temporal lobe and the other in the parietal lobe, called the ventral stream and the dorsal stream, respectively. The ventral pathway from primary visual cortex is dominated by "what" information, while the dorsal pathway is dominated by "where" information. Inspired by this structure, we have proposed an object detection framework involving the integration of a "What Network" and a "Where Network". The aim of the What Network is to provide selective attention to the relevant parts of the input image. The Where Network uses this information to locate and classify objects of interest. In this paper, we compare this approach to state-of-the-art algorithms on the PASCAL VOC 2007 and 2012 and COCO object detection challenge datasets. Also, we compare out approach to human "ground-truth" attention. We report the results of an eye-tracking experiment on human subjects using images from PASCAL VOC 2007, and we demonstrate interesting relationships between human overt attention and information processing in our WW-Nets. Finally, we provide evidence that our proposed method performs favorably in comparison to other object detection approaches, often by a large margin. The code and the eye-tracking ground-truth dataset can be found at: https://github.com/mkebrahimpour.

Related papers

LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion [79.22197702626542]
This paper introduces a framework that explores amodal segmentation for robotic grasping in cluttered scenes. We propose a Linear-fusion Attention-guided Convolutional Network (LAC-Net) The results on different datasets show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-08-06T14:50:48Z)
Capturing the objects of vision with neural networks [0.0]
Human visual perception carves a scene at its physical joints, decomposing the world into objects. Deep neural network (DNN) models of visual object recognition, by contrast, remain largely tethered to the sensory input. We review related work in both fields and examine how these fields can help each other.
arXiv Detail & Related papers (2021-09-07T21:49:53Z)
GTNet:Guided Transformer Network for Detecting Human-Object Interactions [10.809778265707916]
The human-object interaction (HOI) detection task refers to localizing humans, localizing objects, and predicting the interactions between each human-object pair. For detecting HOI, it is important to utilize relative spatial configurations and object semantics to find salient spatial regions of images. This issue is addressed by the novel self-attention based guided transformer network, GTNet.
arXiv Detail & Related papers (2021-08-02T02:06:33Z)
Location-Sensitive Visual Recognition with Cross-IOU Loss [177.86369890708457]
This paper proposes a unified solution named location-sensitive network (LSNet) for object detection, instance segmentation, and pose estimation. Based on a deep neural network as the backbone, LSNet predicts an anchor point and a set of landmarks which together define the shape of the target object.
arXiv Detail & Related papers (2021-04-11T02:17:14Z)
Where2Act: From Pixels to Actions for Articulated 3D Objects [54.19638599501286]
We extract highly localized actionable information related to elementary actions such as pushing or pulling for articulated objects with movable parts. We propose a learning-from-interaction framework with an online data sampling strategy that allows us to train the network in simulation. Our learned models even transfer to real-world data.
arXiv Detail & Related papers (2021-01-07T18:56:38Z)
Understanding the Role of Individual Units in a Deep Neural Network [85.23117441162772]
We present an analytic framework to systematically identify hidden units within image classification and image generation networks. First, we analyze a convolutional neural network (CNN) trained on scene classification and discover units that match a diverse set of object concepts. Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes.
arXiv Detail & Related papers (2020-09-10T17:59:10Z)
Ventral-Dorsal Neural Networks: Object Detection via Selective Attention [51.79577908317031]
We propose a new framework called Ventral-Dorsal Networks (VDNets) Inspired by the structure of the human visual system, we propose the integration of a "Ventral Network" and a "Dorsal Network" Our experimental results reveal that the proposed method outperforms state-of-the-art object detection approaches.
arXiv Detail & Related papers (2020-05-15T23:57:36Z)
SpotNet: Self-Attention Multi-Task Network for Object Detection [11.444576186559487]
We produce foreground/background segmentation labels in a semi-supervised way, using background subtraction or optical flow. We use those segmentation maps inside the network as a self-attention mechanism to weight the feature map used to produce the bounding boxes. We show that by using this method, we obtain a significant mAP improvement on two traffic surveillance datasets.
arXiv Detail & Related papers (2020-02-13T14:43:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.