Traces of Image Memorability in Vision Encoders: Activations, Attention Distributions and Autoencoder Losses
- URL: http://arxiv.org/abs/2509.01453v1
- Date: Mon, 01 Sep 2025 13:11:59 GMT
- Title: Traces of Image Memorability in Vision Encoders: Activations, Attention Distributions and Autoencoder Losses
- Authors: Ece Takmaz, Albert Gatt, Jakub Dotlacil,
- Abstract summary: This paper explores the correlates of image memorability in pretrained vision encoders.<n>We find that these features correlate with memorability to some extent.<n>Results shed light on the relationship between model-internal features and memorability.
- Score: 5.369009163979958
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Images vary in how memorable they are to humans. Inspired by findings from cognitive science and computer vision, this paper explores the correlates of image memorability in pretrained vision encoders, focusing on latent activations, attention distributions, and the uniformity of image patches. We find that these features correlate with memorability to some extent. Additionally, we explore sparse autoencoder loss over the representations of vision transformers as a proxy for memorability, which yields results outperforming past methods using convolutional neural network representations. Our results shed light on the relationship between model-internal features and memorability. They show that some features are informative predictors of what makes images memorable to humans.
Related papers
- From Images to Perception: Emergence of Perceptual Properties by Reconstructing Images [1.77513002450736]
A bio-inspired architecture that can accommodate several known facts in the retina-V1 cortex, the PerceptNet, has been end-to-end optimized for different tasks related to image reconstruction.<n>Our results show that the encoder stage consistently exhibits the highest correlation with human perceptual judgments on image distortion.
arXiv Detail & Related papers (2025-08-14T08:37:30Z) - Sensitive Image Classification by Vision Transformers [1.9598097298813262]
Vision transformer models leverage a self-attention mechanism to capture global interactions among contextual local elements.<n>In our study, we conducted a comparative analysis between various popular vision transformer models and traditional pre-trained ResNet models.<n>The findings demonstrated that vision transformer networks surpassed the benchmark pre-trained models, showcasing their superior classification and detection capabilities.
arXiv Detail & Related papers (2024-12-21T02:34:24Z) - Modeling Visual Memorability Assessment with Autoencoders Reveals Characteristics of Memorable Images [2.4861619769660637]
Image memorability refers to the phenomenon where certain images are more likely to be remembered than others.<n>Despite advances in understanding human visual perception and memory, it is unclear what features contribute to an image's memorability.<n>We employ an autoencoder-based approach built on VGG16 convolutional neural networks (CNNs) to learn latent representations of images.
arXiv Detail & Related papers (2024-10-19T22:58:33Z) - When Does Perceptual Alignment Benefit Vision Representations? [76.32336818860965]
We investigate how aligning vision model representations to human perceptual judgments impacts their usability.
We find that aligning models to perceptual judgments yields representations that improve upon the original backbones across many downstream tasks.
Our results suggest that injecting an inductive bias about human perceptual knowledge into vision models can contribute to better representations.
arXiv Detail & Related papers (2024-10-14T17:59:58Z) - Connectivity-Inspired Network for Context-Aware Recognition [1.049712834719005]
We focus on the effect of incorporating circuit motifs found in biological brains to address visual recognition.
Our convolutional architecture is inspired by the connectivity of human cortical and subcortical streams.
We present a new plug-and-play module to model context awareness.
arXiv Detail & Related papers (2024-09-06T15:42:10Z) - Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention [62.671435607043875]
Research indicates that text-to-image diffusion models replicate images from their training data, raising tremendous concerns about potential copyright infringement and privacy risks.<n>We reveal that during memorization, the cross-attention tends to focus disproportionately on the embeddings of specific tokens.<n>We introduce an innovative approach to detect and mitigate memorization in diffusion models.
arXiv Detail & Related papers (2024-03-17T01:27:00Z) - Human-imperceptible, Machine-recognizable Images [76.01951148048603]
A major conflict is exposed relating to software engineers between better developing AI systems and distancing from the sensitive training data.
This paper proposes an efficient privacy-preserving learning paradigm, where images are encrypted to become human-imperceptible, machine-recognizable''
We show that the proposed paradigm can ensure the encrypted images have become human-imperceptible while preserving machine-recognizable information.
arXiv Detail & Related papers (2023-06-06T13:41:37Z) - A domain adaptive deep learning solution for scanpath prediction of
paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings.
We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans.
The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z) - Proactive Pseudo-Intervention: Causally Informed Contrastive Learning
For Interpretable Vision Models [103.64435911083432]
We present a novel contrastive learning strategy called it Proactive Pseudo-Intervention (PPI)
PPI leverages proactive interventions to guard against image features with no causal relevance.
We also devise a novel causally informed salience mapping module to identify key image pixels to intervene, and show it greatly facilitates model interpretability.
arXiv Detail & Related papers (2020-12-06T20:30:26Z) - Unsupervised Deep Metric Learning with Transformed Attention Consistency
and Contrastive Clustering Loss [28.17607283348278]
Existing approaches for unsupervised metric learning focus on exploring self-supervision information within the input image itself.
We observe that, when analyzing images, human eyes often compare images against each other instead of examining images individually.
We develop a new approach to unsupervised deep metric learning where the network is learned based on self-supervision information across images.
arXiv Detail & Related papers (2020-08-10T19:33:47Z) - Generative Hierarchical Features from Synthesizing Images [65.66756821069124]
We show that learning to synthesize images can bring remarkable hierarchical visual features that are generalizable across a wide range of applications.
The visual feature produced by our encoder, termed as Generative Hierarchical Feature (GH-Feat), has strong transferability to both generative and discriminative tasks.
arXiv Detail & Related papers (2020-07-20T18:04:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.