Memorability: An image-computable measure of information utility
- URL: http://arxiv.org/abs/2104.00805v1
- Date: Thu, 1 Apr 2021 23:38:30 GMT
- Title: Memorability: An image-computable measure of information utility
- Authors: Zoya Bylinskii, Lore Goetschalckx, Anelise Newman, Aude Oliva
- Abstract summary: This chapter details the state-of-the-art algorithms that accurately predict image memorability.
We discuss the design of algorithms and visualizations for face, object, and scene memorability.
We show how recent A.I. approaches can be used to create and modify visual memorability.
- Score: 21.920488962633218
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The pixels in an image, and the objects, scenes, and actions that they
compose, determine whether an image will be memorable or forgettable. While
memorability varies by image, it is largely independent of an individual
observer. Observer independence is what makes memorability an image-computable
measure of information, and eligible for automatic prediction. In this chapter,
we zoom into memorability with a computational lens, detailing the
state-of-the-art algorithms that accurately predict image memorability relative
to human behavioral data, using image features at different scales from raw
pixels to semantic labels. We discuss the design of algorithms and
visualizations for face, object, and scene memorability, as well as algorithms
that generalize beyond static scenes to actions and videos. We cover the
state-of-the-art deep learning approaches that are the current front runners in
the memorability prediction space. Beyond prediction, we show how recent A.I.
approaches can be used to create and modify visual memorability. Finally, we
preview the computational applications that memorability can power, from
filtering visual streams to enhancing augmented reality interfaces.
Related papers
- OAT: Object-Level Attention Transformer for Gaze Scanpath Prediction [0.2796197251957245]
This paper introduces the Object-level Attention Transformer (OAT)
OAT predicts human scanpaths as they search for a target object within a cluttered scene of distractors.
We evaluate OAT on the Amazon book cover dataset and a new dataset for visual search that we collected.
arXiv Detail & Related papers (2024-07-18T09:33:17Z) - Seeing the Unseen: Visual Common Sense for Semantic Placement [71.76026880991245]
Given an image, a vision system is asked to predict semantically-meaningful regions (masks or bounding boxes) in the image where that object could be placed or is likely be placed by humans.
We call this task: Semantic Placement (SP) and believe that such common-sense visual understanding is critical for assitive robots (tidying a house), and AR devices (automatically rendering an object in the user's space)
arXiv Detail & Related papers (2024-01-15T15:28:30Z) - From seeing to remembering: Images with harder-to-reconstruct
representations leave stronger memory traces [4.012995481864761]
We present a sparse coding model for compressing feature embeddings of images, and show that the reconstruction residuals from this model predict how well images are encoded into memory.
In an open memorability dataset of scene images, we show that reconstruction error not only explains memory accuracy but also response latencies during retrieval, subsuming, in the latter case, all of the variance explained by powerful vision-only models.
arXiv Detail & Related papers (2023-02-21T01:40:32Z) - DisPositioNet: Disentangled Pose and Identity in Semantic Image
Manipulation [83.51882381294357]
DisPositioNet is a model that learns a disentangled representation for each object for the task of image manipulation using scene graphs.
Our framework enables the disentanglement of the variational latent embeddings as well as the feature representation in the graph.
arXiv Detail & Related papers (2022-11-10T11:47:37Z) - A domain adaptive deep learning solution for scanpath prediction of
paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings.
We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans.
The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z) - Exploring CLIP for Assessing the Look and Feel of Images [87.97623543523858]
We introduce Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner.
Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments.
arXiv Detail & Related papers (2022-07-25T17:58:16Z) - Learning an Adaptation Function to Assess Image Visual Similarities [0.0]
We focus here on the specific task of learning visual image similarities when analogy matters.
We propose to compare different supervised, semi-supervised and self-supervised networks, pre-trained on distinct scales and contents datasets.
Our experiments conducted on the Totally Looks Like image dataset highlight the interest of our method, by increasing the retrieval scores of the best model @1 by 2.25x.
arXiv Detail & Related papers (2022-06-03T07:15:00Z) - Efficient data-driven encoding of scene motion using Eccentricity [0.993963191737888]
This paper presents a novel approach of representing dynamic visual scenes with static maps generated from video/image streams.
The maps are 2D matrices calculated in a pixel-wise manner, that is based on the concept of Eccentricity data analysis.
The list of potential applications includes video-based activity recognition, intent recognition, object tracking, video description.
arXiv Detail & Related papers (2021-03-03T23:11:21Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z) - Visual Chirality [51.685596116645776]
We investigate how statistics of visual data are changed by reflection.
Our work has implications for data augmentation, self-supervised learning, and image forensics.
arXiv Detail & Related papers (2020-06-16T20:48:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.