Combining Different V1 Brain Model Variants to Improve Robustness to
Image Corruptions in CNNs
- URL: http://arxiv.org/abs/2110.10645v1
- Date: Wed, 20 Oct 2021 16:35:09 GMT
- Title: Combining Different V1 Brain Model Variants to Improve Robustness to
Image Corruptions in CNNs
- Authors: Avinash Baidya, Joel Dapello, James J. DiCarlo, Tiago Marques
- Abstract summary: We show that simulating a primary visual cortex (V1) at the front of convolutional neural networks (CNNs) leads to small improvements in robustness to image perturbations.
We build a new model using an ensembling technique, which combines multiple individual models with different V1 front-end variants.
We show that using distillation, it is possible to partially compress the knowledge in the ensemble model into a single model with a V1 front-end.
- Score: 5.875680381119361
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: While some convolutional neural networks (CNNs) have surpassed human visual
abilities in object classification, they often struggle to recognize objects in
images corrupted with different types of common noise patterns, highlighting a
major limitation of this family of models. Recently, it has been shown that
simulating a primary visual cortex (V1) at the front of CNNs leads to small
improvements in robustness to these image perturbations. In this study, we
start with the observation that different variants of the V1 model show gains
for specific corruption types. We then build a new model using an ensembling
technique, which combines multiple individual models with different V1
front-end variants. The model ensemble leverages the strengths of each
individual model, leading to significant improvements in robustness across all
corruption categories and outperforming the base model by 38% on average.
Finally, we show that using distillation, it is possible to partially compress
the knowledge in the ensemble model into a single model with a V1 front-end.
While the ensembling and distillation techniques used here are hardly
biologically-plausible, the results presented here demonstrate that by
combining the specific strengths of different neuronal circuits in V1 it is
possible to improve the robustness of CNNs for a wide range of perturbations.
Related papers
- Explicitly Modeling Pre-Cortical Vision with a Neuro-Inspired Front-End Improves CNN Robustness [1.8434042562191815]
CNNs struggle to classify images corrupted with common corruptions.
Recent work has shown that incorporating a CNN front-end block that simulates some features of the primate primary visual cortex (V1) can improve overall model robustness.
We introduce two novel biologically-inspired CNN model families that incorporate a new front-end block designed to simulate pre-cortical visual processing.
arXiv Detail & Related papers (2024-09-25T11:43:29Z) - ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models [55.07988373824348]
We study the visual generalization capabilities of three existing robotic foundation models.
Our study shows that the existing models do not exhibit robustness to visual out-of-domain scenarios.
We propose a gradual backbone reversal approach founded on model merging.
arXiv Detail & Related papers (2024-09-23T17:47:59Z) - A Comparative Study of CNN, ResNet, and Vision Transformers for Multi-Classification of Chest Diseases [0.0]
Vision Transformers (ViT) are powerful tools due to their scalability and ability to process large amounts of data.
We fine-tuned two variants of ViT models, one pre-trained on ImageNet and another trained from scratch, using the NIH Chest X-ray dataset.
Our study evaluates the performance of these models in the multi-label classification of 14 distinct diseases.
arXiv Detail & Related papers (2024-05-31T23:56:42Z) - Matching the Neuronal Representations of V1 is Necessary to Improve
Robustness in CNNs with V1-like Front-ends [1.8434042562191815]
Recently, it was shown that simulating computations in early visual areas at the front of convolutional neural networks leads to improvements in robustness to image corruptions.
Here, we show that the neuronal representations that emerge from precisely matching the distribution of RF properties found in primate V1 is key for this improvement in robustness.
arXiv Detail & Related papers (2023-10-16T16:52:15Z) - Heterogeneous Generative Knowledge Distillation with Masked Image
Modeling [33.95780732124864]
Masked image modeling (MIM) methods achieve great success in various visual tasks but remain largely unexplored in knowledge distillation for heterogeneous deep models.
We develop the first Heterogeneous Generative Knowledge Distillation (H-GKD) based on MIM, which can efficiently transfer knowledge from large Transformer models to small CNN-based models in a generative self-supervised fashion.
Our method is a simple yet effective learning paradigm to learn the visual representation and distribution of data from heterogeneous teacher models.
arXiv Detail & Related papers (2023-09-18T08:30:55Z) - Exploring the Robustness of Human Parsers Towards Common Corruptions [99.89886010550836]
We construct three corruption robustness benchmarks, termed LIP-C, ATR-C, and Pascal-Person-Part-C, to assist us in evaluating the risk tolerance of human parsing models.
Inspired by the data augmentation strategy, we propose a novel heterogeneous augmentation-enhanced mechanism to bolster robustness under commonly corrupted conditions.
arXiv Detail & Related papers (2023-09-02T13:32:14Z) - Composing Ensembles of Pre-trained Models via Iterative Consensus [95.10641301155232]
We propose a unified framework for composing ensembles of different pre-trained models.
We use pre-trained models as "generators" or "scorers" and compose them via closed-loop iterative consensus optimization.
We demonstrate that consensus achieved by an ensemble of scorers outperforms the feedback of a single scorer.
arXiv Detail & Related papers (2022-10-20T18:46:31Z) - Empirical Advocacy of Bio-inspired Models for Robust Image Recognition [39.37304194475199]
We provide a detailed analysis of such bio-inspired models and their properties.
We find that bio-inspired models tend to be adversarially robust without requiring any special data augmentation.
We also find that bio-inspired models tend to use both low and mid-frequency information, in contrast to other DCNN models.
arXiv Detail & Related papers (2022-05-18T16:19:26Z) - Improving robustness against common corruptions with frequency biased
models [112.65717928060195]
unseen image corruptions can cause a surprisingly large drop in performance.
Image corruption types have different characteristics in the frequency spectrum and would benefit from a targeted type of data augmentation.
We propose a new regularization scheme that minimizes the total variation (TV) of convolution feature-maps to increase high-frequency robustness.
arXiv Detail & Related papers (2021-03-30T10:44:50Z) - Firearm Detection via Convolutional Neural Networks: Comparing a
Semantic Segmentation Model Against End-to-End Solutions [68.8204255655161]
Threat detection of weapons and aggressive behavior from live video can be used for rapid detection and prevention of potentially deadly incidents.
One way for achieving this is through the use of artificial intelligence and, in particular, machine learning for image analysis.
We compare a traditional monolithic end-to-end deep learning model and a previously proposed model based on an ensemble of simpler neural networks detecting fire-weapons via semantic segmentation.
arXiv Detail & Related papers (2020-12-17T15:19:29Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.