L-WISE: Boosting Human Image Category Learning Through Model-Based Image Selection And Enhancement
- URL: http://arxiv.org/abs/2412.09765v1
- Date: Thu, 12 Dec 2024 23:57:01 GMT
- Title: L-WISE: Boosting Human Image Category Learning Through Model-Based Image Selection And Enhancement
- Authors: Morgan B. Talbot, Gabriel Kreiman, James J. DiCarlo, Guy Gaziv,
- Abstract summary: We propose to augment visual learning in humans in a way that improves human categorization accuracy at test time.
Our learning augmentation approach consists of selecting images based on their model-estimated recognition difficulty, and (ii) using image perturbations that aid recognition for novice learners.
To the best of our knowledge, this is the first application of ANNs to increase visual learning performance in humans by enhancing category-specific features.
- Score: 12.524893323311108
- License:
- Abstract: The currently leading artificial neural network (ANN) models of the visual ventral stream -- which are derived from a combination of performance optimization and robustification methods -- have demonstrated a remarkable degree of behavioral alignment with humans on visual categorization tasks. Extending upon previous work, we show that not only can these models guide image perturbations that change the induced human category percepts, but they also can enhance human ability to accurately report the original ground truth. Furthermore, we find that the same models can also be used out-of-the-box to predict the proportion of correct human responses to individual images, providing a simple, human-aligned estimator of the relative difficulty of each image. Motivated by these observations, we propose to augment visual learning in humans in a way that improves human categorization accuracy at test time. Our learning augmentation approach consists of (i) selecting images based on their model-estimated recognition difficulty, and (ii) using image perturbations that aid recognition for novice learners. We find that combining these model-based strategies gives rise to test-time categorization accuracy gains of 33-72% relative to control subjects without these interventions, despite using the same number of training feedback trials. Surprisingly, beyond the accuracy gain, the training time for the augmented learning group was also shorter by 20-23%. We demonstrate the efficacy of our approach in a fine-grained categorization task with natural images, as well as tasks in two clinically relevant image domains -- histology and dermoscopy -- where visual learning is notoriously challenging. To the best of our knowledge, this is the first application of ANNs to increase visual learning performance in humans by enhancing category-specific features.
Related papers
- Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models.
In this paper, we investigate how detection performance varies across model backbones, types, and datasets.
We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z) - MENTOR: Human Perception-Guided Pretraining for Increased Generalization [5.596752018167751]
We introduce MENTOR (huMan pErceptioN-guided preTraining fOr increased geneRalization)
We train an autoencoder to learn human saliency maps given an input image, without class labels.
We remove the decoder part, add a classification layer on top of the encoder, and fine-tune this new model conventionally.
arXiv Detail & Related papers (2023-10-30T13:50:44Z) - Exploring the Robustness of Human Parsers Towards Common Corruptions [99.89886010550836]
We construct three corruption robustness benchmarks, termed LIP-C, ATR-C, and Pascal-Person-Part-C, to assist us in evaluating the risk tolerance of human parsing models.
Inspired by the data augmentation strategy, we propose a novel heterogeneous augmentation-enhanced mechanism to bolster robustness under commonly corrupted conditions.
arXiv Detail & Related papers (2023-09-02T13:32:14Z) - Learning Transferable Pedestrian Representation from Multimodal
Information Supervision [174.5150760804929]
VAL-PAT is a novel framework that learns transferable representations to enhance various pedestrian analysis tasks with multimodal information.
We first perform pre-training on LUPerson-TA dataset, where each image contains text and attribute annotations.
We then transfer the learned representations to various downstream tasks, including person reID, person attribute recognition and text-based person search.
arXiv Detail & Related papers (2023-04-12T01:20:58Z) - DASH: Visual Analytics for Debiasing Image Classification via
User-Driven Synthetic Data Augmentation [27.780618650580923]
Image classification models often learn to predict a class based on irrelevant co-occurrences between input features and an output class in training data.
We call the unwanted correlations "data biases," and the visual features causing data biases "bias factors"
It is challenging to identify and mitigate biases automatically without human intervention.
arXiv Detail & Related papers (2022-09-14T00:44:41Z) - Conformer and Blind Noisy Students for Improved Image Quality Assessment [80.57006406834466]
Learning-based approaches for perceptual image quality assessment (IQA) usually require both the distorted and reference image for measuring the perceptual quality accurately.
In this work, we explore the performance of transformer-based full-reference IQA models.
We also propose a method for IQA based on semi-supervised knowledge distillation from full-reference teacher models into blind student models.
arXiv Detail & Related papers (2022-04-27T10:21:08Z) - Focus on the Positives: Self-Supervised Learning for Biodiversity
Monitoring [9.086207853136054]
We address the problem of learning self-supervised representations from unlabeled image collections.
We exploit readily available context data that encodes information such as the spatial and temporal relationships between the input images.
For the critical task of global biodiversity monitoring, this results in image features that can be adapted to challenging visual species classification tasks with limited human supervision.
arXiv Detail & Related papers (2021-08-14T01:12:41Z) - Passive attention in artificial neural networks predicts human visual
selectivity [8.50463394182796]
We show that passive attention techniques reveal a significant overlap with human visual selectivity estimates.
We validate these correlational results with causal manipulations using recognition experiments.
This work contributes a new approach to evaluating the biological and psychological validity of leading ANNs as models of human vision.
arXiv Detail & Related papers (2021-07-14T21:21:48Z) - On the Robustness of Pretraining and Self-Supervision for a Deep
Learning-based Analysis of Diabetic Retinopathy [70.71457102672545]
We compare the impact of different training procedures for diabetic retinopathy grading.
We investigate different aspects such as quantitative performance, statistics of the learned feature representations, interpretability and robustness to image distortions.
Our results indicate that models from ImageNet pretraining report a significant increase in performance, generalization and robustness to image distortions.
arXiv Detail & Related papers (2021-06-25T08:32:45Z) - Fooling the primate brain with minimal, targeted image manipulation [67.78919304747498]
We propose an array of methods for creating minimal, targeted image perturbations that lead to changes in both neuronal activity and perception as reflected in behavior.
Our work shares the same goal with adversarial attack, namely the manipulation of images with minimal, targeted noise that leads ANN models to misclassify the images.
arXiv Detail & Related papers (2020-11-11T08:30:54Z) - Seeing eye-to-eye? A comparison of object recognition performance in
humans and deep convolutional neural networks under image manipulation [0.0]
This study aims towards a behavioral comparison of visual core object recognition performance between humans and feedforward neural networks.
Analyses of accuracy revealed that humans not only outperform DCNNs on all conditions, but also display significantly greater robustness towards shape and most notably color alterations.
arXiv Detail & Related papers (2020-07-13T10:26:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.