How to Take a Memorable Picture? Empowering Users with Actionable Feedback
- URL: http://arxiv.org/abs/2602.21877v1
- Date: Wed, 25 Feb 2026 13:02:35 GMT
- Title: How to Take a Memorable Picture? Empowering Users with Actionable Feedback
- Authors: Francesco Laiti, Davide Talon, Jacopo Staiano, Elisa Ricci,
- Abstract summary: We introduce the task of Memorability Feedback (MemFeed), where an automated model should provide actionable, human-interpretable guidance to users.<n>We also present MemCoach, the first approach designed to provide concrete suggestions in natural language for memorability improvement.
- Score: 16.746442650748044
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Image memorability, i.e., how likely an image is to be remembered, has traditionally been studied in computer vision either as a passive prediction task, with models regressing a scalar score, or with generative methods altering the visual input to boost the image likelihood of being remembered. Yet, none of these paradigms supports users at capture time, when the crucial question is how to improve a photo memorability. We introduce the task of Memorability Feedback (MemFeed), where an automated model should provide actionable, human-interpretable guidance to users with the goal to enhance an image future recall. We also present MemCoach, the first approach designed to provide concrete suggestions in natural language for memorability improvement (e.g., "emphasize facial expression," "bring the subject forward"). Our method, based on Multimodal Large Language Models (MLLMs), is training-free and employs a teacher-student steering strategy, aligning the model internal activations toward more memorable patterns learned from a teacher model progressing along least-to-most memorable samples. To enable systematic evaluation on this novel task, we further introduce MemBench, a new benchmark featuring sequence-aligned photoshoots with annotated memorability scores. Our experiments, considering multiple MLLMs, demonstrate the effectiveness of MemCoach, showing consistently improved performance over several zero-shot models. The results indicate that memorability can not only be predicted but also taught and instructed, shifting the focus from mere prediction to actionable feedback for human creators.
Related papers
- Self-Rewarding Large Vision-Language Models for Optimizing Prompts in Text-to-Image Generation [61.31036260686349]
We propose a novel prompt optimization framework, designed to rephrase a simple user prompt into a sophisticated prompt to a text-to-image model.<n> Specifically, we employ the large vision language models (LVLMs) as the solver to rewrite the user prompt, and concurrently, employ LVLMs as a reward model to score the aesthetics and alignment of the images generated by the optimized prompt.<n>Instead of laborious human feedback, we exploit the prior knowledge of the LVLM to provide rewards, i.e., AI feedback.
arXiv Detail & Related papers (2025-05-22T15:05:07Z) - Flux Already Knows -- Activating Subject-Driven Image Generation without Training [25.496237241889048]
We propose a zero-shot framework for subject-driven image generation using a vanilla Flux model.<n>We activate strong identity-preserving capabilities without any additional data, training, or inference-time fine-tuning.
arXiv Detail & Related papers (2025-04-12T20:41:53Z) - Evolved Hierarchical Masking for Self-Supervised Learning [49.77271430882176]
Existing Masked Image Modeling methods apply fixed mask patterns to guide the self-supervised training.<n>This paper introduces an evolved hierarchical masking method to pursue general visual cues modeling in self-supervised learning.
arXiv Detail & Related papers (2025-04-12T09:40:14Z) - Unforgettable Lessons from Forgettable Images: Intra-Class Memorability Matters in Computer Vision [17.85820426682908]
We introduce intra-class memorability, where certain images within the same class are more memorable than others.<n>We propose the Intra-Class Memorability score (ICMscore), a novel metric that incorporates the temporal intervals between repeated image presentations into its calculation.<n>We curate the Intra-Class Memorability dataset (ICMD), comprising over 5,000 images across ten object classes with their ICMscores derived from 2,000 participants' responses.
arXiv Detail & Related papers (2024-12-30T07:09:28Z) - Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation [70.95783968368124]
We introduce a novel multi-modal autoregressive model, dubbed $textbfInstaManip$.<n>We propose an innovative group self-attention mechanism to break down the in-context learning process into two separate stages.<n>Our method surpasses previous few-shot image manipulation models by a notable margin.
arXiv Detail & Related papers (2024-12-02T01:19:21Z) - From Prototypes to General Distributions: An Efficient Curriculum for Masked Image Modeling [11.634154932876719]
Masked Image Modeling has emerged as a powerful self-supervised learning paradigm for visual representation learning.<n>We propose a prototype-driven curriculum leagrning framework that structures the learning process to progress from prototypical examples to more complex variations in the dataset.<n>Our findings suggest that carefully controlling the order of training examples plays a crucial role in self-supervised visual learning.
arXiv Detail & Related papers (2024-11-16T03:21:06Z) - Enhancing Large Vision Language Models with Self-Training on Image Comprehension [131.14381425260706]
We introduce Self-Training on Image (STIC), which emphasizes a self-training approach specifically for image comprehension.
First, the model self-constructs a preference for image descriptions using unlabeled images.
To further self-improve reasoning on the extracted visual information, we let the model reuse a small portion of existing instruction-tuning data.
arXiv Detail & Related papers (2024-05-30T05:53:49Z) - Déjà Vu Memorization in Vision-Language Models [39.51189095703773]
We propose a new method for measuring memorization in Vision-Language Models (VLMs)
We show that the model indeed retains information about individual objects in the training images beyond what can be inferred from correlations or the image caption.
We evaluate d'eja vu memorization at both sample and population level, and show that it is significant for OpenCLIP trained on as many as 50M image-caption pairs.
arXiv Detail & Related papers (2024-02-03T09:55:35Z) - Exploring Target Representations for Masked Autoencoders [78.57196600585462]
We show that a careful choice of the target representation is unnecessary for learning good representations.
We propose a multi-stage masked distillation pipeline and use a randomly model as the teacher.
A proposed method to perform masked knowledge distillation with bootstrapped teachers (dBOT) outperforms previous self-supervised methods by nontrivial margins.
arXiv Detail & Related papers (2022-09-08T16:55:19Z) - SPeCiaL: Self-Supervised Pretraining for Continual Learning [49.34919926042038]
SPeCiaL is a method for unsupervised pretraining of representations tailored for continual learning.
We evaluate SPeCiaL in the Continual Few-Shot Learning setting, and show that it can match or outperform other supervised pretraining approaches.
arXiv Detail & Related papers (2021-06-16T18:15:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.