On the Influence of Shape, Texture and Color for Learning Semantic Segmentation
- URL: http://arxiv.org/abs/2410.14878v1
- Date: Fri, 18 Oct 2024 21:52:02 GMT
- Title: On the Influence of Shape, Texture and Color for Learning Semantic Segmentation
- Authors: Annika Mütze, Natalie Grabowsky, Edgar Heinert, Matthias Rottmann, Hanno Gottschalk,
- Abstract summary: In recent years, a body of works has emerged, studying shape and texture biases of off-the-shelf pre-trained deep neural networks (DNN) for image classification.
We study these questions on semantic segmentation which allows us to address our questions on pixel level.
Our study on three datasets reveals that neither texture nor shape clearly dominate the learning success, however a combination of shape and color but without texture achieves surprisingly strong results.
- Score: 5.172964916120902
- License:
- Abstract: In recent years, a body of works has emerged, studying shape and texture biases of off-the-shelf pre-trained deep neural networks (DNN) for image classification. These works study how much a trained DNN relies on image cues, predominantly shape and texture. In this work, we switch the perspective, posing the following questions: What can a DNN learn from each of the image cues, i.e., shape, texture and color, respectively? How much does each cue influence the learning success? And what are the synergy effects between different cues? Studying these questions sheds light upon cue influences on learning and thus the learning capabilities of DNNs. We study these questions on semantic segmentation which allows us to address our questions on pixel level. To conduct this study, we develop a generic procedure to decompose a given dataset into multiple ones, each of them only containing either a single cue or a chosen mixture. This framework is then applied to two real-world datasets, Cityscapes and PASCAL Context, and a synthetic data set based on the CARLA simulator. We learn the given semantic segmentation task from these cue datasets, creating cue experts. Early fusion of cues is performed by constructing appropriate datasets. This is complemented by a late fusion of experts which allows us to study cue influence location-dependent on pixel level. Our study on three datasets reveals that neither texture nor shape clearly dominate the learning success, however a combination of shape and color but without texture achieves surprisingly strong results. Our findings hold for convolutional and transformer backbones. In particular, qualitatively there is almost no difference in how both of the architecture types extract information from the different cues.
Related papers
- Effect of Rotation Angle in Self-Supervised Pre-training is Dataset-Dependent [3.434553688053531]
Self-supervised learning for pre-training can help the network learn better low-level features.
In contrastive pre-training, the network is pre-trained to distinguish between different versions of the input.
We show that, when training using contrastive pre-training in this way, the angle $theta$ and the dataset interact in interesting ways.
arXiv Detail & Related papers (2024-06-21T12:25:07Z) - Self-supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration [107.61458720202984]
This paper introduces a novel self-supervised learning framework for enhancing 3D perception in autonomous driving scenes.
We propose the learnable transformation alignment to bridge the domain gap between image and point cloud data.
We establish dense 2D-3D correspondences to estimate the rigid pose.
arXiv Detail & Related papers (2024-01-23T02:41:06Z) - Contrastive Learning of Features between Images and LiDAR [18.211513930388417]
This work treats learning cross-modal features as a dense contrastive learning problem.
To learn good features and not lose generality, we developed a variant of widely used PointNet++ architecture for images.
We show that our models indeed learn information from both images as well as LiDAR by visualizing the features.
arXiv Detail & Related papers (2022-06-24T04:35:23Z) - Investigating Neural Architectures by Synthetic Dataset Design [14.317837518705302]
Recent years have seen the emergence of many new neural network structures (architectures and layers)
We sketch a methodology to measure the effect of each structure on a network's ability, by designing ad hoc synthetic datasets.
We illustrate our methodology by building three datasets to evaluate each of the three following network properties.
arXiv Detail & Related papers (2022-04-23T10:50:52Z) - Learning Co-segmentation by Segment Swapping for Retrieval and Discovery [67.6609943904996]
The goal of this work is to efficiently identify visually similar patterns from a pair of images.
We generate synthetic training pairs by selecting object segments in an image and copy-pasting them into another image.
We show our approach provides clear improvements for artwork details retrieval on the Brueghel dataset.
arXiv Detail & Related papers (2021-10-29T16:51:16Z) - Shape or Texture: Understanding Discriminative Features in CNNs [28.513300496205044]
Recent studies have shown that CNNs actually exhibit a texture bias'
We show that a network learns the majority of overall shape information at the first few epochs of training.
We also show that the encoding of shape does not imply the encoding of localized per-pixel semantic information.
arXiv Detail & Related papers (2021-01-27T18:54:00Z) - Assessing The Importance Of Colours For CNNs In Object Recognition [70.70151719764021]
Convolutional neural networks (CNNs) have been shown to exhibit conflicting properties.
We demonstrate that CNNs often rely heavily on colour information while making a prediction.
We evaluate a model trained with congruent images on congruent, greyscale, and incongruent images.
arXiv Detail & Related papers (2020-12-12T22:55:06Z) - Informative Dropout for Robust Representation Learning: A Shape-bias
Perspective [84.30946377024297]
We propose a light-weight model-agnostic method, namely Informative Dropout (InfoDrop), to improve interpretability and reduce texture bias.
Specifically, we discriminate texture from shape based on local self-information in an image, and adopt a Dropout-like algorithm to decorrelate the model output from the local texture.
arXiv Detail & Related papers (2020-08-10T16:52:24Z) - What Do Neural Networks Learn When Trained With Random Labels? [20.54410239839646]
We study deep neural networks (DNNs) trained on natural image data with entirely random labels.
We show analytically for convolutional and fully connected networks that an alignment between the principal components of network parameters and data takes place when training with random labels.
We show how this alignment produces a positive transfer: networks pre-trained with random labels train faster downstream compared to training from scratch.
arXiv Detail & Related papers (2020-06-18T12:07:22Z) - Self-supervised Learning on Graphs: Deep Insights and New Direction [66.78374374440467]
Self-supervised learning (SSL) aims to create domain specific pretext tasks on unlabeled data.
There are increasing interests in generalizing deep learning to the graph domain in the form of graph neural networks (GNNs)
arXiv Detail & Related papers (2020-06-17T20:30:04Z) - Linguistically Driven Graph Capsule Network for Visual Question
Reasoning [153.76012414126643]
We propose a hierarchical compositional reasoning model called the "Linguistically driven Graph Capsule Network"
The compositional process is guided by the linguistic parse tree. Specifically, we bind each capsule in the lowest layer to bridge the linguistic embedding of a single word in the original question with visual evidence.
Experiments on the CLEVR dataset, CLEVR compositional generation test, and FigureQA dataset demonstrate the effectiveness and composition generalization ability of our end-to-end model.
arXiv Detail & Related papers (2020-03-23T03:34:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.