Related papers: Combining Different V1 Brain Model Variants to Improve Robustness to Image Corruptions in CNNs

Combining Different V1 Brain Model Variants to Improve Robustness to Image Corruptions in CNNs

URL: http://arxiv.org/abs/2110.10645v1
Date: Wed, 20 Oct 2021 16:35:09 GMT
Title: Combining Different V1 Brain Model Variants to Improve Robustness to Image Corruptions in CNNs
Authors: Avinash Baidya, Joel Dapello, James J. DiCarlo, Tiago Marques
Abstract summary: We show that simulating a primary visual cortex (V1) at the front of convolutional neural networks (CNNs) leads to small improvements in robustness to image perturbations. We build a new model using an ensembling technique, which combines multiple individual models with different V1 front-end variants. We show that using distillation, it is possible to partially compress the knowledge in the ensemble model into a single model with a V1 front-end.
Score: 5.875680381119361
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: While some convolutional neural networks (CNNs) have surpassed human visual abilities in object classification, they often struggle to recognize objects in images corrupted with different types of common noise patterns, highlighting a major limitation of this family of models. Recently, it has been shown that simulating a primary visual cortex (V1) at the front of CNNs leads to small improvements in robustness to these image perturbations. In this study, we start with the observation that different variants of the V1 model show gains for specific corruption types. We then build a new model using an ensembling technique, which combines multiple individual models with different V1 front-end variants. The model ensemble leverages the strengths of each individual model, leading to significant improvements in robustness across all corruption categories and outperforming the base model by 38% on average. Finally, we show that using distillation, it is possible to partially compress the knowledge in the ensemble model into a single model with a V1 front-end. While the ensembling and distillation techniques used here are hardly biologically-plausible, the results presented here demonstrate that by combining the specific strengths of different neuronal circuits in V1 it is possible to improve the robustness of CNNs for a wide range of perturbations.

Related papers

An Efficient Framework for Enhancing Discriminative Models via Diffusion Techniques [12.470257882838126]
We propose the Diffusion-Based Discriminative Model Enhancement Framework (DBMEF) This framework seamlessly integrates discriminative and generative models in a training-free manner. DBMEF can effectively enhance the classification accuracy and capability of discriminative models in a plug-and-play manner.
arXiv Detail & Related papers (2024-12-12T08:46:22Z)
Explicitly Modeling Pre-Cortical Vision with a Neuro-Inspired Front-End Improves CNN Robustness [1.8434042562191815]
CNNs struggle to classify images corrupted with common corruptions. Recent work has shown that incorporating a CNN front-end block that simulates some features of the primate primary visual cortex (V1) can improve overall model robustness. We introduce two novel biologically-inspired CNN model families that incorporate a new front-end block designed to simulate pre-cortical visual processing.
arXiv Detail & Related papers (2024-09-25T11:43:29Z)
ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models [55.07988373824348]
We study the visual generalization capabilities of three existing robotic foundation models. Our study shows that the existing models do not exhibit robustness to visual out-of-domain scenarios. We propose a gradual backbone reversal approach founded on model merging.
arXiv Detail & Related papers (2024-09-23T17:47:59Z)
A Comparative Study of CNN, ResNet, and Vision Transformers for Multi-Classification of Chest Diseases [0.0]
Vision Transformers (ViT) are powerful tools due to their scalability and ability to process large amounts of data. We fine-tuned two variants of ViT models, one pre-trained on ImageNet and another trained from scratch, using the NIH Chest X-ray dataset. Our study evaluates the performance of these models in the multi-label classification of 14 distinct diseases.
arXiv Detail & Related papers (2024-05-31T23:56:42Z)
Matching the Neuronal Representations of V1 is Necessary to Improve Robustness in CNNs with V1-like Front-ends [1.8434042562191815]
Recently, it was shown that simulating computations in early visual areas at the front of convolutional neural networks leads to improvements in robustness to image corruptions. Here, we show that the neuronal representations that emerge from precisely matching the distribution of RF properties found in primate V1 is key for this improvement in robustness.
arXiv Detail & Related papers (2023-10-16T16:52:15Z)
Heterogeneous Generative Knowledge Distillation with Masked Image Modeling [33.95780732124864]
Masked image modeling (MIM) methods achieve great success in various visual tasks but remain largely unexplored in knowledge distillation for heterogeneous deep models. We develop the first Heterogeneous Generative Knowledge Distillation (H-GKD) based on MIM, which can efficiently transfer knowledge from large Transformer models to small CNN-based models in a generative self-supervised fashion. Our method is a simple yet effective learning paradigm to learn the visual representation and distribution of data from heterogeneous teacher models.
arXiv Detail & Related papers (2023-09-18T08:30:55Z)
Exploring the Robustness of Human Parsers Towards Common Corruptions [99.89886010550836]
We construct three corruption robustness benchmarks, termed LIP-C, ATR-C, and Pascal-Person-Part-C, to assist us in evaluating the risk tolerance of human parsing models. Inspired by the data augmentation strategy, we propose a novel heterogeneous augmentation-enhanced mechanism to bolster robustness under commonly corrupted conditions.
arXiv Detail & Related papers (2023-09-02T13:32:14Z)
Composing Ensembles of Pre-trained Models via Iterative Consensus [95.10641301155232]
We propose a unified framework for composing ensembles of different pre-trained models. We use pre-trained models as "generators" or "scorers" and compose them via closed-loop iterative consensus optimization. We demonstrate that consensus achieved by an ensemble of scorers outperforms the feedback of a single scorer.
arXiv Detail & Related papers (2022-10-20T18:46:31Z)
Empirical Advocacy of Bio-inspired Models for Robust Image Recognition [39.37304194475199]
We provide a detailed analysis of such bio-inspired models and their properties. We find that bio-inspired models tend to be adversarially robust without requiring any special data augmentation. We also find that bio-inspired models tend to use both low and mid-frequency information, in contrast to other DCNN models.
arXiv Detail & Related papers (2022-05-18T16:19:26Z)
Improving robustness against common corruptions with frequency biased models [112.65717928060195]
unseen image corruptions can cause a surprisingly large drop in performance. Image corruption types have different characteristics in the frequency spectrum and would benefit from a targeted type of data augmentation. We propose a new regularization scheme that minimizes the total variation (TV) of convolution feature-maps to increase high-frequency robustness.
arXiv Detail & Related papers (2021-03-30T10:44:50Z)
Firearm Detection via Convolutional Neural Networks: Comparing a Semantic Segmentation Model Against End-to-End Solutions [68.8204255655161]
Threat detection of weapons and aggressive behavior from live video can be used for rapid detection and prevention of potentially deadly incidents. One way for achieving this is through the use of artificial intelligence and, in particular, machine learning for image analysis. We compare a traditional monolithic end-to-end deep learning model and a previously proposed model based on an ensemble of simpler neural networks detecting fire-weapons via semantic segmentation.
arXiv Detail & Related papers (2020-12-17T15:19:29Z)
Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors. We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method. Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.