Related papers: Memorability: An image-computable measure of information utility

Memorability: An image-computable measure of information utility

URL: http://arxiv.org/abs/2104.00805v1
Date: Thu, 1 Apr 2021 23:38:30 GMT
Title: Memorability: An image-computable measure of information utility
Authors: Zoya Bylinskii, Lore Goetschalckx, Anelise Newman, Aude Oliva
Abstract summary: This chapter details the state-of-the-art algorithms that accurately predict image memorability. We discuss the design of algorithms and visualizations for face, object, and scene memorability. We show how recent A.I. approaches can be used to create and modify visual memorability.
Score: 21.920488962633218
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The pixels in an image, and the objects, scenes, and actions that they compose, determine whether an image will be memorable or forgettable. While memorability varies by image, it is largely independent of an individual observer. Observer independence is what makes memorability an image-computable measure of information, and eligible for automatic prediction. In this chapter, we zoom into memorability with a computational lens, detailing the state-of-the-art algorithms that accurately predict image memorability relative to human behavioral data, using image features at different scales from raw pixels to semantic labels. We discuss the design of algorithms and visualizations for face, object, and scene memorability, as well as algorithms that generalize beyond static scenes to actions and videos. We cover the state-of-the-art deep learning approaches that are the current front runners in the memorability prediction space. Beyond prediction, we show how recent A.I. approaches can be used to create and modify visual memorability. Finally, we preview the computational applications that memorability can power, from filtering visual streams to enhancing augmented reality interfaces.

Related papers

Unforgettable Lessons from Forgettable Images: Intra-Class Memorability Matters in Computer Vision Tasks [8.210681499876216]
We introduce intra-class memorability, where certain images within the same class are more memorable than others despite shared category characteristics. We propose the Intra-Class Memorability score (ICMscore), a novel metric that incorporates the temporal intervals between repeated image presentations into its calculation.
arXiv Detail & Related papers (2024-12-30T07:09:28Z)
Modeling Visual Memorability Assessment with Autoencoders Reveals Characteristics of Memorable Images [2.4861619769660637]
Image memorability refers to the phenomenon where certain images are more likely to be remembered than others. We modeled the subjective experience of visual memorability using an autoencoder based on VGG16 Convolutional Neural Networks (CNNs) We investigated the relationship between memorability and reconstruction error, assessed latent space representations distinctiveness, and developed a Gated Recurrent Unit (GRU) model to predict memorability likelihood.
arXiv Detail & Related papers (2024-10-19T22:58:33Z)
When Does Perceptual Alignment Benefit Vision Representations? [76.32336818860965]
We investigate how aligning vision model representations to human perceptual judgments impacts their usability. We find that aligning models to perceptual judgments yields representations that improve upon the original backbones across many downstream tasks. Our results suggest that injecting an inductive bias about human perceptual knowledge into vision models can contribute to better representations.
arXiv Detail & Related papers (2024-10-14T17:59:58Z)
See or Guess: Counterfactually Regularized Image Captioning [32.82695612178604]
We present a generic image captioning framework that employs causal inference to make existing models more capable of interventional tasks, and counterfactually explainable. Our method effectively reduces hallucinations and improves the model's faithfulness to images, demonstrating high portability across both small-scale and large-scale image-to-text models.
arXiv Detail & Related papers (2024-08-29T17:59:57Z)
OAT: Object-Level Attention Transformer for Gaze Scanpath Prediction [0.2796197251957245]
This paper introduces the Object-level Attention Transformer (OAT) OAT predicts human scanpaths as they search for a target object within a cluttered scene of distractors. We evaluate OAT on the Amazon book cover dataset and a new dataset for visual search that we collected.
arXiv Detail & Related papers (2024-07-18T09:33:17Z)
DisPositioNet: Disentangled Pose and Identity in Semantic Image Manipulation [83.51882381294357]
DisPositioNet is a model that learns a disentangled representation for each object for the task of image manipulation using scene graphs. Our framework enables the disentanglement of the variational latent embeddings as well as the feature representation in the graph.
arXiv Detail & Related papers (2022-11-10T11:47:37Z)
A domain adaptive deep learning solution for scanpath prediction of paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings. We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans. The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z)
Exploring CLIP for Assessing the Look and Feel of Images [87.97623543523858]
We introduce Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner. Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments.
arXiv Detail & Related papers (2022-07-25T17:58:16Z)
Enhancing Social Relation Inference with Concise Interaction Graph and Discriminative Scene Representation [56.25878966006678]
We propose an approach of textbfPRactical textbfInference in textbfSocial rtextbfElation (PRISE) It concisely learns interactive features of persons and discriminative features of holistic scenes. PRISE achieves 6.8$%$ improvement for domain classification in PIPA dataset.
arXiv Detail & Related papers (2021-07-30T04:20:13Z)
Efficient data-driven encoding of scene motion using Eccentricity [0.993963191737888]
This paper presents a novel approach of representing dynamic visual scenes with static maps generated from video/image streams. The maps are 2D matrices calculated in a pixel-wise manner, that is based on the concept of Eccentricity data analysis. The list of potential applications includes video-based activity recognition, intent recognition, object tracking, video description.
arXiv Detail & Related papers (2021-03-03T23:11:21Z)
What Can You Learn from Your Muscles? Learning Visual Representation from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations. Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z)
Visual Chirality [51.685596116645776]
We investigate how statistics of visual data are changed by reflection. Our work has implications for data augmentation, self-supervised learning, and image forensics.
arXiv Detail & Related papers (2020-06-16T20:48:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.