Connectivity-Inspired Network for Context-Aware Recognition
- URL: http://arxiv.org/abs/2409.04360v1
- Date: Fri, 6 Sep 2024 15:42:10 GMT
- Title: Connectivity-Inspired Network for Context-Aware Recognition
- Authors: Gianluca Carloni, Sara Colantonio,
- Abstract summary: We focus on the effect of incorporating circuit motifs found in biological brains to address visual recognition.
Our convolutional architecture is inspired by the connectivity of human cortical and subcortical streams.
We present a new plug-and-play module to model context awareness.
- Score: 1.049712834719005
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The aim of this paper is threefold. We inform the AI practitioner about the human visual system with an extensive literature review; we propose a novel biologically motivated neural network for image classification; and, finally, we present a new plug-and-play module to model context awareness. We focus on the effect of incorporating circuit motifs found in biological brains to address visual recognition. Our convolutional architecture is inspired by the connectivity of human cortical and subcortical streams, and we implement bottom-up and top-down modulations that mimic the extensive afferent and efferent connections between visual and cognitive areas. Our Contextual Attention Block is simple and effective and can be integrated with any feed-forward neural network. It infers weights that multiply the feature maps according to their causal influence on the scene, modeling the co-occurrence of different objects in the image. We place our module at different bottlenecks to infuse a hierarchical context awareness into the model. We validated our proposals through image classification experiments on benchmark data and found a consistent improvement in performance and the robustness of the produced explanations via class activation. Our code is available at https://github.com/gianlucarloni/CoCoReco.
Related papers
- Understanding the Role of Pathways in a Deep Neural Network [4.456675543894722]
We analyze a convolutional neural network (CNN) trained in the classification task and present an algorithm to extract the diffusion pathways of individual pixels.
We find that the few largest pathways of an individual pixel from an image tend to cross the feature maps in each layer that is important for classification.
arXiv Detail & Related papers (2024-02-28T07:53:19Z) - Connecting metrics for shape-texture knowledge in computer vision [1.7785095623975342]
Deep neural networks remain brittle and susceptible to many changes in the image that do not cause humans to misclassify images.
Part of this different behavior may be explained by the type of features humans and deep neural networks use in vision tasks.
arXiv Detail & Related papers (2023-01-25T14:37:42Z) - Semantic Brain Decoding: from fMRI to conceptually similar image
reconstruction of visual stimuli [0.29005223064604074]
We propose a novel approach to brain decoding that also relies on semantic and contextual similarity.
We employ an fMRI dataset of natural image vision and create a deep learning decoding pipeline inspired by the existence of both bottom-up and top-down processes in human vision.
We produce reconstructions of visual stimuli that match the original content very well on a semantic level, surpassing the state of the art in previous literature.
arXiv Detail & Related papers (2022-12-13T16:54:08Z) - A domain adaptive deep learning solution for scanpath prediction of
paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings.
We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans.
The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z) - Learning What and Where -- Unsupervised Disentangling Location and
Identity Tracking [0.44040106718326594]
We introduce an unsupervisedd LOCation and Identity tracking system (Loci)
Inspired by the dorsal-ventral pathways in the brain, Loci tackles the what-and-where binding problem by means of a self-supervised segregation mechanism.
Loci may set the stage for deeper, explanation-oriented video processing.
arXiv Detail & Related papers (2022-05-26T13:30:14Z) - Functional2Structural: Cross-Modality Brain Networks Representation
Learning [55.24969686433101]
Graph mining on brain networks may facilitate the discovery of novel biomarkers for clinical phenotypes and neurodegenerative diseases.
We propose a novel graph learning framework, known as Deep Signed Brain Networks (DSBN), with a signed graph encoder.
We validate our framework on clinical phenotype and neurodegenerative disease prediction tasks using two independent, publicly available datasets.
arXiv Detail & Related papers (2022-05-06T03:45:36Z) - Understanding the Role of Individual Units in a Deep Neural Network [85.23117441162772]
We present an analytic framework to systematically identify hidden units within image classification and image generation networks.
First, we analyze a convolutional neural network (CNN) trained on scene classification and discover units that match a diverse set of object concepts.
Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes.
arXiv Detail & Related papers (2020-09-10T17:59:10Z) - Visual Relationship Detection with Visual-Linguistic Knowledge from
Multimodal Representations [103.00383924074585]
Visual relationship detection aims to reason over relationships among salient objects in images.
We propose a novel approach named Visual-Linguistic Representations from Transformers (RVL-BERT)
RVL-BERT performs spatial reasoning with both visual and language commonsense knowledge learned via self-supervised pre-training.
arXiv Detail & Related papers (2020-09-10T16:15:09Z) - AssembleNet++: Assembling Modality Representations via Attention
Connections [83.50084190050093]
We create a family of powerful video models which are able to: (i) learn interactions between semantic object information and raw appearance and motion features, and (ii) deploy attention in order to better learn the importance of features at each convolutional block of the network.
A new network component named peer-attention is introduced, which dynamically learns the attention weights using another block or input modality.
arXiv Detail & Related papers (2020-08-18T17:54:08Z) - Ventral-Dorsal Neural Networks: Object Detection via Selective Attention [51.79577908317031]
We propose a new framework called Ventral-Dorsal Networks (VDNets)
Inspired by the structure of the human visual system, we propose the integration of a "Ventral Network" and a "Dorsal Network"
Our experimental results reveal that the proposed method outperforms state-of-the-art object detection approaches.
arXiv Detail & Related papers (2020-05-15T23:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.