Multimodal Memorability: Modeling Effects of Semantics and Decay on
Video Memorability
- URL: http://arxiv.org/abs/2009.02568v1
- Date: Sat, 5 Sep 2020 17:24:02 GMT
- Title: Multimodal Memorability: Modeling Effects of Semantics and Decay on
Video Memorability
- Authors: Anelise Newman, Camilo Fosco, Vincent Casser, Allen Lee, Barry
McNamara, and Aude Oliva
- Abstract summary: We develop a predictive model of human visual event memory and how those memories decay over time.
We introduce Memento10k, a new, dynamic video memorability dataset containing human annotations at different viewing delays.
- Score: 17.00485879591431
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A key capability of an intelligent system is deciding when events from past
experience must be remembered and when they can be forgotten. Towards this
goal, we develop a predictive model of human visual event memory and how those
memories decay over time. We introduce Memento10k, a new, dynamic video
memorability dataset containing human annotations at different viewing delays.
Based on our findings we propose a new mathematical formulation of memorability
decay, resulting in a model that is able to produce the first quantitative
estimation of how a video decays in memory over time. In contrast with previous
work, our model can predict the probability that a video will be remembered at
an arbitrary delay. Importantly, our approach combines visual and semantic
information (in the form of textual captions) to fully represent the meaning of
events. Our experiments on two video memorability benchmarks, including
Memento10k, show that our model significantly improves upon the best prior
approach (by 12% on average).
Related papers
- Causal Estimation of Memorisation Profiles [58.20086589761273]
Understanding memorisation in language models has practical and societal implications.
Memorisation is the causal effect of training with an instance on the model's ability to predict that instance.
This paper proposes a new, principled, and efficient method to estimate memorisation based on the difference-in-differences design from econometrics.
arXiv Detail & Related papers (2024-06-06T17:59:09Z) - Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past.
We leverage the large-scale pretraining of image diffusion models which can handle multi-modality.
We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z) - Glance and Focus: Memory Prompting for Multi-Event Video Question
Answering [36.00733800536469]
VideoQA has emerged as a vital tool to evaluate agents' ability to understand human daily behaviors.
Humans can easily tackle it by using a series of episode memories as anchors to quickly locate question-related key moments for reasoning.
We propose the Glance-Focus model to mimic this effective reasoning strategy.
arXiv Detail & Related papers (2024-01-03T03:51:16Z) - STDiff: Spatio-temporal Diffusion for Continuous Stochastic Video
Prediction [20.701792842768747]
We propose a novel video prediction model, which has infinite-dimensional latent variables over the temporal domain.
Our model is able to achieve temporal continuous prediction, i.e., predicting in an unsupervised way, with an arbitrarily high frame rate.
arXiv Detail & Related papers (2023-12-11T16:12:43Z) - Eye vs. AI: Human Gaze and Model Attention in Video Memorability [22.718191366938278]
We propose a Transformer-based model with naturalistic-temporal attention that matches SoTA performance on video memorability prediction.
We compare model attention against human gaze fixation density maps collected through a small-scale eye-tracking experiment.
We observe that the model assigns greater importance to the initial frames, mimicking temporal attention patterns found in humans.
arXiv Detail & Related papers (2023-11-26T05:14:06Z) - Memory-and-Anticipation Transformer for Online Action Understanding [52.24561192781971]
We propose a novel memory-anticipation-based paradigm to model an entire temporal structure, including the past, present, and future.
We present Memory-and-Anticipation Transformer (MAT), a memory-anticipation-based approach, to address the online action detection and anticipation tasks.
arXiv Detail & Related papers (2023-08-15T17:34:54Z) - Self-Regulated Learning for Egocentric Video Activity Anticipation [147.9783215348252]
Self-Regulated Learning (SRL) aims to regulate the intermediate representation consecutively to produce representation that emphasizes the novel information in the frame of the current time-stamp.
SRL sharply outperforms existing state-of-the-art in most cases on two egocentric video datasets and two third-person video datasets.
arXiv Detail & Related papers (2021-11-23T03:29:18Z) - Associative Memories via Predictive Coding [37.59398215921529]
Associative memories in the brain receive and store patterns of activity registered by the sensory neurons.
We present a novel neural model for realizing associative memories based on a hierarchical generative network that receives external stimuli via sensory neurons.
arXiv Detail & Related papers (2021-09-16T15:46:26Z) - FitVid: Overfitting in Pixel-Level Video Prediction [117.59339756506142]
We introduce a new architecture, named FitVid, which is capable of severe overfitting on the common benchmarks.
FitVid outperforms the current state-of-the-art models across four different video prediction benchmarks on four different metrics.
arXiv Detail & Related papers (2021-06-24T17:20:21Z) - MERLOT: Multimodal Neural Script Knowledge Models [74.05631672657452]
We introduce MERLOT, a model that learns multimodal script knowledge by watching millions of YouTube videos with transcribed speech.
MERLOT exhibits strong out-of-the-box representations of temporal commonsense, and achieves state-of-the-art performance on 12 different video QA datasets.
On Visual Commonsense Reasoning, MERLOT answers questions correctly with 80.6% accuracy, outperforming state-of-the-art models of similar size by over 3%.
arXiv Detail & Related papers (2021-06-04T17:57:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.