WW-Nets: Dual Neural Networks for Object Detection
- URL: http://arxiv.org/abs/2005.07787v1
- Date: Fri, 15 May 2020 21:16:22 GMT
- Title: WW-Nets: Dual Neural Networks for Object Detection
- Authors: Mohammad K. Ebrahimpour, J. Ben Falandays, Samuel Spevack, Ming-Hsuan
Yang, and David C. Noelle
- Abstract summary: We propose a new deep convolutional neural network framework that uses object location knowledge implicit in network connection weights to guide selective attention in object detection tasks.
Our approach is called What-Where Nets (WW-Nets), and it is inspired by the structure of human visual pathways.
- Score: 48.67090730174743
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a new deep convolutional neural network framework that uses object
location knowledge implicit in network connection weights to guide selective
attention in object detection tasks. Our approach is called What-Where Nets
(WW-Nets), and it is inspired by the structure of human visual pathways. In the
brain, vision incorporates two separate streams, one in the temporal lobe and
the other in the parietal lobe, called the ventral stream and the dorsal
stream, respectively. The ventral pathway from primary visual cortex is
dominated by "what" information, while the dorsal pathway is dominated by
"where" information. Inspired by this structure, we have proposed an object
detection framework involving the integration of a "What Network" and a "Where
Network". The aim of the What Network is to provide selective attention to the
relevant parts of the input image. The Where Network uses this information to
locate and classify objects of interest. In this paper, we compare this
approach to state-of-the-art algorithms on the PASCAL VOC 2007 and 2012 and
COCO object detection challenge datasets. Also, we compare out approach to
human "ground-truth" attention. We report the results of an eye-tracking
experiment on human subjects using images from PASCAL VOC 2007, and we
demonstrate interesting relationships between human overt attention and
information processing in our WW-Nets. Finally, we provide evidence that our
proposed method performs favorably in comparison to other object detection
approaches, often by a large margin. The code and the eye-tracking ground-truth
dataset can be found at: https://github.com/mkebrahimpour.
Related papers
- LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion [79.22197702626542]
This paper introduces a framework that explores amodal segmentation for robotic grasping in cluttered scenes.
We propose a Linear-fusion Attention-guided Convolutional Network (LAC-Net)
The results on different datasets show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-08-06T14:50:48Z) - Capturing the objects of vision with neural networks [0.0]
Human visual perception carves a scene at its physical joints, decomposing the world into objects.
Deep neural network (DNN) models of visual object recognition, by contrast, remain largely tethered to the sensory input.
We review related work in both fields and examine how these fields can help each other.
arXiv Detail & Related papers (2021-09-07T21:49:53Z) - GTNet:Guided Transformer Network for Detecting Human-Object Interactions [10.809778265707916]
The human-object interaction (HOI) detection task refers to localizing humans, localizing objects, and predicting the interactions between each human-object pair.
For detecting HOI, it is important to utilize relative spatial configurations and object semantics to find salient spatial regions of images.
This issue is addressed by the novel self-attention based guided transformer network, GTNet.
arXiv Detail & Related papers (2021-08-02T02:06:33Z) - Location-Sensitive Visual Recognition with Cross-IOU Loss [177.86369890708457]
This paper proposes a unified solution named location-sensitive network (LSNet) for object detection, instance segmentation, and pose estimation.
Based on a deep neural network as the backbone, LSNet predicts an anchor point and a set of landmarks which together define the shape of the target object.
arXiv Detail & Related papers (2021-04-11T02:17:14Z) - Where2Act: From Pixels to Actions for Articulated 3D Objects [54.19638599501286]
We extract highly localized actionable information related to elementary actions such as pushing or pulling for articulated objects with movable parts.
We propose a learning-from-interaction framework with an online data sampling strategy that allows us to train the network in simulation.
Our learned models even transfer to real-world data.
arXiv Detail & Related papers (2021-01-07T18:56:38Z) - Understanding the Role of Individual Units in a Deep Neural Network [85.23117441162772]
We present an analytic framework to systematically identify hidden units within image classification and image generation networks.
First, we analyze a convolutional neural network (CNN) trained on scene classification and discover units that match a diverse set of object concepts.
Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes.
arXiv Detail & Related papers (2020-09-10T17:59:10Z) - Ventral-Dorsal Neural Networks: Object Detection via Selective Attention [51.79577908317031]
We propose a new framework called Ventral-Dorsal Networks (VDNets)
Inspired by the structure of the human visual system, we propose the integration of a "Ventral Network" and a "Dorsal Network"
Our experimental results reveal that the proposed method outperforms state-of-the-art object detection approaches.
arXiv Detail & Related papers (2020-05-15T23:57:36Z) - SpotNet: Self-Attention Multi-Task Network for Object Detection [11.444576186559487]
We produce foreground/background segmentation labels in a semi-supervised way, using background subtraction or optical flow.
We use those segmentation maps inside the network as a self-attention mechanism to weight the feature map used to produce the bounding boxes.
We show that by using this method, we obtain a significant mAP improvement on two traffic surveillance datasets.
arXiv Detail & Related papers (2020-02-13T14:43:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.