Related papers: Connecting metrics for shape-texture knowledge in computer vision

Connecting metrics for shape-texture knowledge in computer vision

URL: http://arxiv.org/abs/2301.10608v1
Date: Wed, 25 Jan 2023 14:37:42 GMT
Title: Connecting metrics for shape-texture knowledge in computer vision
Authors: Tiago Oliveira, Tiago Marques, Arlindo L. Oliveira
Abstract summary: Deep neural networks remain brittle and susceptible to many changes in the image that do not cause humans to misclassify images. Part of this different behavior may be explained by the type of features humans and deep neural networks use in vision tasks.
Score: 1.7785095623975342
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern artificial neural networks, including convolutional neural networks and vision transformers, have mastered several computer vision tasks, including object recognition. However, there are many significant differences between the behavior and robustness of these systems and of the human visual system. Deep neural networks remain brittle and susceptible to many changes in the image that do not cause humans to misclassify images. Part of this different behavior may be explained by the type of features humans and deep neural networks use in vision tasks. Humans tend to classify objects according to their shape while deep neural networks seem to rely mostly on texture. Exploring this question is relevant, since it may lead to better performing neural network architectures and to a better understanding of the workings of the vision system of primates. In this work, we advance the state of the art in our understanding of this phenomenon, by extending previous analyses to a much larger set of deep neural network architectures. We found that the performance of models in image classification tasks is highly correlated with their shape bias measured at the output and penultimate layer. Furthermore, our results showed that the number of neurons that represent shape and texture are strongly anti-correlated, thus providing evidence that there is competition between these two types of features. Finally, we observed that while in general there is a correlation between performance and shape bias, there are significant variations between architecture families.

Related papers

VITAL: More Understandable Feature Visualization through Distribution Alignment and Relevant Information Flow [57.96482272333649]
Feature visualization (FV) is a powerful tool to decode what information neurons are responding to. We propose to guide FV through statistics of prototypical image features combined with measures of relevant network flow to generate images. Our approach yields human-understandable visualizations that both qualitatively and quantitatively improve over state-of-the-art FVs.
arXiv Detail & Related papers (2025-03-28T13:08:18Z)
Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters. Our approach enables a single model to encode neural computational graphs with diverse architectures. We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z)
Super Consistency of Neural Network Landscapes and Learning Rate Transfer [72.54450821671624]
We study the landscape through the lens of the loss Hessian. We find that certain spectral properties under $mu$P are largely independent of the size of the network. We show that in the Neural Tangent Kernel (NTK) and other scaling regimes, the sharpness exhibits very different dynamics at different scales.
arXiv Detail & Related papers (2024-02-27T12:28:01Z)
Evaluating alignment between humans and neural network representations in image-based learning tasks [5.657101730705275]
We tested how well the representations of $86$ pretrained neural network models mapped to human learning trajectories. We found that while training dataset size was a core determinant of alignment with human choices, contrastive training with multi-modal data (text and imagery) was a common feature of currently publicly available models that predicted human generalisation. In conclusion, pretrained neural networks can serve to extract representations for cognitive models, as they appear to capture some fundamental aspects of cognition that are transferable across tasks.
arXiv Detail & Related papers (2023-06-15T08:18:29Z)
Degraded Polygons Raise Fundamental Questions of Neural Network Perception [5.423100066629618]
We revisit the task of recovering images under degradation, first introduced over 30 years ago in the Recognition-by-Components theory of human vision. We implement the Automated Shape Recoverability Test for rapidly generating large-scale datasets of perimeter-degraded regular polygons. We find that neural networks' behavior on this simple task conflicts with human behavior.
arXiv Detail & Related papers (2023-06-08T06:02:39Z)
Human alignment of neural network representations [22.671101285994013]
We investigate the factors that affect the alignment between the representations learned by neural networks and human mental representations inferred from behavioral responses. We find that model scale and architecture have essentially no effect on the alignment with human behavioral responses. We find that some human concepts such as food and animals are well-represented by neural networks whereas others such as royal or sports-related objects are not.
arXiv Detail & Related papers (2022-11-02T15:23:16Z)
Prune and distill: similar reformatting of image information along rat visual cortex and deep neural networks [61.60177890353585]
Deep convolutional neural networks (CNNs) have been shown to provide excellent models for its functional analogue in the brain, the ventral stream in visual cortex. Here we consider some prominent statistical patterns that are known to exist in the internal representations of either CNNs or the visual cortex. We show that CNNs and visual cortex share a similarly tight relationship between dimensionality expansion/reduction of object representations and reformatting of image information.
arXiv Detail & Related papers (2022-05-27T08:06:40Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Are Convolutional Neural Networks or Transformers more like human vision? [9.83454308668432]
We show that attention-based networks can achieve higher accuracy than CNNs on vision tasks. These results have implications both for building more human-like vision models, as well as for understanding visual object recognition in humans.
arXiv Detail & Related papers (2021-05-15T10:33:35Z)
Learning Contact Dynamics using Physically Structured Neural Networks [81.73947303886753]
We use connections between deep neural networks and differential equations to design a family of deep network architectures for representing contact dynamics between objects. We show that these networks can learn discontinuous contact events in a data-efficient manner from noisy observations. Our results indicate that an idealised form of touch feedback is a key component of making this learning problem tractable.
arXiv Detail & Related papers (2021-02-22T17:33:51Z)
Graph Structure of Neural Networks [104.33754950606298]
We show how the graph structure of neural networks affect their predictive performance. A "sweet spot" of relational graphs leads to neural networks with significantly improved predictive performance. Top-performing neural networks have graph structure surprisingly similar to those of real biological neural networks.
arXiv Detail & Related papers (2020-07-13T17:59:31Z)
Seeing eye-to-eye? A comparison of object recognition performance in humans and deep convolutional neural networks under image manipulation [0.0]
This study aims towards a behavioral comparison of visual core object recognition performance between humans and feedforward neural networks. Analyses of accuracy revealed that humans not only outperform DCNNs on all conditions, but also display significantly greater robustness towards shape and most notably color alterations.
arXiv Detail & Related papers (2020-07-13T10:26:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.