Déjà Vu? Decoding Repeated Reading from Eye Movements
- URL: http://arxiv.org/abs/2502.11061v1
- Date: Sun, 16 Feb 2025 09:59:29 GMT
- Title: Déjà Vu? Decoding Repeated Reading from Eye Movements
- Authors: Yoav Meiri, Omer Shubi, Cfir Avraham Hadar, Ariel Kreisberg Nitzav, Yevgeni Berzak,
- Abstract summary: We ask whether it is possible to automatically determine whether the reader has previously encountered a text based on their eye movement patterns.
We introduce two variants of this task and address them with considerable success using both feature-based and neural models.
We present an analysis of model performance which on the one hand yields insights on the information used by the models, and on the other hand leverages predictive modeling as an analytic tool for better characterization of the role of memory in repeated reading.
- Score: 1.1652979442763178
- License:
- Abstract: Be it your favorite novel, a newswire article, a cooking recipe or an academic paper -- in many daily situations we read the same text more than once. In this work, we ask whether it is possible to automatically determine whether the reader has previously encountered a text based on their eye movement patterns. We introduce two variants of this task and address them with considerable success using both feature-based and neural models. We further introduce a general strategy for enhancing these models with machine generated simulations of eye movements from a cognitive model. Finally, we present an analysis of model performance which on the one hand yields insights on the information used by the models, and on the other hand leverages predictive modeling as an analytic tool for better characterization of the role of memory in repeated reading. Our work advances the understanding of the extent and manner in which eye movements in reading capture memory effects from prior text exposure, and paves the way for future applications that involve predictive modeling of repeated reading.
Related papers
- Decoding Reading Goals from Eye Movements [1.3176926720381554]
We examine whether it is possible to distinguish between two types of common reading goals: information seeking and ordinary reading for comprehension.
Using large-scale eye tracking data, we address this task with a wide range of models that cover different architectural and data representation strategies.
We find that accurate predictions can be made in real time, long before the participant finished reading the text.
arXiv Detail & Related papers (2024-10-28T06:40:03Z) - Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past.
We leverage the large-scale pretraining of image diffusion models which can handle multi-modality.
We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z) - What Makes Pre-Trained Visual Representations Successful for Robust
Manipulation? [57.92924256181857]
We find that visual representations designed for manipulation and control tasks do not necessarily generalize under subtle changes in lighting and scene texture.
We find that emergent segmentation ability is a strong predictor of out-of-distribution generalization among ViT models.
arXiv Detail & Related papers (2023-11-03T18:09:08Z) - ScanDL: A Diffusion Model for Generating Synthetic Scanpaths on Texts [0.5520145204626482]
Eye movements in reading play a crucial role in psycholinguistic research.
The scarcity of eye movement data and its unavailability at application time poses a major challenge for this line of research.
We propose ScanDL, a novel discrete sequence-to-sequence diffusion model that generates synthetic scanpaths on texts.
arXiv Detail & Related papers (2023-10-24T07:52:19Z) - Localization vs. Semantics: Visual Representations in Unimodal and
Multimodal Models [57.08925810659545]
We conduct a comparative analysis of the visual representations in existing vision-and-language models and vision-only models.
Our empirical observations suggest that vision-and-language models are better at label prediction tasks.
We hope our study sheds light on the role of language in visual learning, and serves as an empirical guide for various pretrained models.
arXiv Detail & Related papers (2022-12-01T05:00:18Z) - Revisiting Classifier: Transferring Vision-Language Models for Video
Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research.
In this study, we focus on transferring knowledge for video classification tasks.
We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z) - Dynamic Modeling of Hand-Object Interactions via Tactile Sensing [133.52375730875696]
In this work, we employ a high-resolution tactile glove to perform four different interactive activities on a diversified set of objects.
We build our model on a cross-modal learning framework and generate the labels using a visual processing pipeline to supervise the tactile model.
This work takes a step on dynamics modeling in hand-object interactions from dense tactile sensing.
arXiv Detail & Related papers (2021-09-09T16:04:14Z) - Improving Generation and Evaluation of Visual Stories via Semantic
Consistency [72.00815192668193]
Given a series of natural language captions, an agent must generate a sequence of images that correspond to the captions.
Prior work has introduced recurrent generative models which outperform synthesis text-to-image models on this task.
We present a number of improvements to prior modeling approaches, including the addition of a dual learning framework.
arXiv Detail & Related papers (2021-05-20T20:42:42Z) - Reading Isn't Believing: Adversarial Attacks On Multi-Modal Neurons [0.0]
We show that contradictory text and image signals can confuse the model into choosing false (visual) options.
We show by example that the CLIP model tends to read first, look later, a phenomenon we describe as reading isn't believing.
arXiv Detail & Related papers (2021-03-18T18:56:51Z) - Using Human Psychophysics to Evaluate Generalization in Scene Text
Recognition Models [7.294729862905325]
We characterize two important scene text recognition models by measuring their domains.
The domains specifies the ability of readers to generalize to different word lengths, fonts, and amounts of occlusion.
arXiv Detail & Related papers (2020-06-30T19:51:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.