Related papers: Computing a human-like reaction time metric from stable recurrent vision models

Computing a human-like reaction time metric from stable recurrent vision models

URL: http://arxiv.org/abs/2306.11582v2
Date: Mon, 6 Nov 2023 16:39:38 GMT
Title: Computing a human-like reaction time metric from stable recurrent vision models
Authors: Lore Goetschalckx, Lakshmi Narasimhan Govindarajan, Alekh Karkada Ashok, Aarit Ahuja, David L. Sheinberg, Thomas Serre
Abstract summary: We sketch a general-purpose methodology to construct computational accounts of reaction times from a stimulus-computable, task-optimized model. We demonstrate that our metric aligns with patterns of human reaction times for stimulus manipulations across four disparate visual decision-making tasks. This work paves the way for exploring the temporal alignment of model and human visual strategies in the context of various other cognitive tasks.
Score: 11.87006916768365
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The meteoric rise in the adoption of deep neural networks as computational models of vision has inspired efforts to "align" these models with humans. One dimension of interest for alignment includes behavioral choices, but moving beyond characterizing choice patterns to capturing temporal aspects of visual decision-making has been challenging. Here, we sketch a general-purpose methodology to construct computational accounts of reaction times from a stimulus-computable, task-optimized model. Specifically, we introduce a novel metric leveraging insights from subjective logic theory summarizing evidence accumulation in recurrent vision models. We demonstrate that our metric aligns with patterns of human reaction times for stimulus manipulations across four disparate visual decision-making tasks spanning perceptual grouping, mental simulation, and scene categorization. This work paves the way for exploring the temporal alignment of model and human visual strategies in the context of various other cognitive tasks toward generating testable hypotheses for neuroscience. Links to the code and data can be found on the project page: https://serre-lab.github.io/rnn_rts_site.

Related papers

SITE: towards Spatial Intelligence Thorough Evaluation [121.1493852562597]
Spatial intelligence (SI) represents a cognitive ability encompassing the visualization, manipulation, and reasoning about spatial relationships.<n>We introduce SITE, a benchmark dataset towards SI Thorough Evaluation.<n>Our approach to curating the benchmark combines a bottom-up survey about 31 existing datasets and a top-down strategy drawing upon three classification systems in cognitive science.
arXiv Detail & Related papers (2025-05-08T17:45:44Z)
Object segmentation from common fate: Motion energy processing enables human-like zero-shot generalization to random dot stimuli [10.978614683038758]
We evaluate a broad range of optical flow models and a neuroscience inspired motion energy model for zero-shot figure-ground segmentation. We find that a cross section of 40 deep optical flow models trained on different datasets struggle to estimate motion patterns in random dot videos. This neuroscience-inspired model successfully addresses the lack of human-like zero-shot generalization to random dot stimuli in current computer vision models.
arXiv Detail & Related papers (2024-11-03T09:59:45Z)
When Does Perceptual Alignment Benefit Vision Representations? [76.32336818860965]
We investigate how aligning vision model representations to human perceptual judgments impacts their usability. We find that aligning models to perceptual judgments yields representations that improve upon the original backbones across many downstream tasks. Our results suggest that injecting an inductive bias about human perceptual knowledge into vision models can contribute to better representations.
arXiv Detail & Related papers (2024-10-14T17:59:58Z)
Evaluating Multiview Object Consistency in Humans and Image Models [68.36073530804296]
We leverage an experimental design from the cognitive sciences which requires zero-shot visual inferences about object shape. We collect 35K trials of behavioral data from over 500 participants. We then evaluate the performance of common vision models.
arXiv Detail & Related papers (2024-09-09T17:59:13Z)
Aligning Neuronal Coding of Dynamic Visual Scenes with Foundation Vision Models [2.790870674964473]
We propose Vi-ST, atemporal convolutional neural network fed with a self-supervised Vision Transformer (ViT) Our proposed Vi-ST demonstrates a novel modeling framework for neuronal coding of dynamic visual scenes in the brain.
arXiv Detail & Related papers (2024-07-15T14:06:13Z)
Evaluating alignment between humans and neural network representations in image-based learning tasks [5.657101730705275]
We tested how well the representations of $86$ pretrained neural network models mapped to human learning trajectories. We found that while training dataset size was a core determinant of alignment with human choices, contrastive training with multi-modal data (text and imagery) was a common feature of currently publicly available models that predicted human generalisation. In conclusion, pretrained neural networks can serve to extract representations for cognitive models, as they appear to capture some fundamental aspects of cognition that are transferable across tasks.
arXiv Detail & Related papers (2023-06-15T08:18:29Z)
Adapting Brain-Like Neural Networks for Modeling Cortical Visual Prostheses [68.96380145211093]
Cortical prostheses are devices implanted in the visual cortex that attempt to restore lost vision by electrically stimulating neurons. Currently, the vision provided by these devices is limited, and accurately predicting the visual percepts resulting from stimulation is an open challenge. We propose to address this challenge by utilizing 'brain-like' convolutional neural networks (CNNs), which have emerged as promising models of the visual system.
arXiv Detail & Related papers (2022-09-27T17:33:19Z)
LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human Bodies [78.17425779503047]
We propose a novel neural implicit representation for the human body. It is fully differentiable and optimizable with disentangled shape and pose latent spaces. Our model can be trained and fine-tuned directly on non-watertight raw data with well-designed losses.
arXiv Detail & Related papers (2021-11-30T04:10:57Z)
Drop, Swap, and Generate: A Self-Supervised Approach for Generating Neural Activity [33.06823702945747]
We introduce a novel unsupervised approach for learning disentangled representations of neural activity called Swap-VAE. Our approach combines a generative modeling framework with an instance-specific alignment loss. We show that it is possible to build representations that disentangle neural datasets along relevant latent dimensions linked to behavior.
arXiv Detail & Related papers (2021-11-03T16:39:43Z)
The Neural Coding Framework for Learning Generative Models [91.0357317238509]
We propose a novel neural generative model inspired by the theory of predictive processing in the brain. In a similar way, artificial neurons in our generative model predict what neighboring neurons will do, and adjust their parameters based on how well the predictions matched reality.
arXiv Detail & Related papers (2020-12-07T01:20:38Z)
Fooling the primate brain with minimal, targeted image manipulation [67.78919304747498]
We propose an array of methods for creating minimal, targeted image perturbations that lead to changes in both neuronal activity and perception as reflected in behavior. Our work shares the same goal with adversarial attack, namely the manipulation of images with minimal, targeted noise that leads ANN models to misclassify the images.
arXiv Detail & Related papers (2020-11-11T08:30:54Z)
A Meta-Bayesian Model of Intentional Visual Search [0.0]
We propose a computational model of visual search that incorporates Bayesian interpretations of the neural mechanisms that underlie categorical perception and saccade planning. To enable meaningful comparisons between simulated and human behaviours, we employ a gaze-contingent paradigm that required participants to classify occluded MNIST digits through a window that followed their gaze. Our model is able to recapitulate human behavioural metrics such as classification accuracy while retaining a high degree of interpretability, which we demonstrate by recovering subject-specific parameters from observed human behaviour.
arXiv Detail & Related papers (2020-06-05T16:10:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.