Eyettention: An Attention-based Dual-Sequence Model for Predicting Human
Scanpaths during Reading
- URL: http://arxiv.org/abs/2304.10784v2
- Date: Thu, 18 May 2023 08:24:03 GMT
- Title: Eyettention: An Attention-based Dual-Sequence Model for Predicting Human
Scanpaths during Reading
- Authors: Shuwen Deng, David R. Reich, Paul Prasse, Patrick Haller, Tobias
Scheffer and Lena A. J\"ager
- Abstract summary: We develop Eyettention, the first dual-sequence model that simultaneously processes the sequence of words and the chronological sequence of fixations.
We show that Eyettention outperforms state-of-the-art models in predicting scanpaths.
- Score: 3.9766585251585282
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Eye movements during reading offer insights into both the reader's cognitive
processes and the characteristics of the text that is being read. Hence, the
analysis of scanpaths in reading have attracted increasing attention across
fields, ranging from cognitive science over linguistics to computer science. In
particular, eye-tracking-while-reading data has been argued to bear the
potential to make machine-learning-based language models exhibit a more
human-like linguistic behavior. However, one of the main challenges in modeling
human scanpaths in reading is their dual-sequence nature: the words are ordered
following the grammatical rules of the language, whereas the fixations are
chronologically ordered. As humans do not strictly read from left-to-right, but
rather skip or refixate words and regress to previous words, the alignment of
the linguistic and the temporal sequence is non-trivial. In this paper, we
develop Eyettention, the first dual-sequence model that simultaneously
processes the sequence of words and the chronological sequence of fixations.
The alignment of the two sequences is achieved by a cross-sequence attention
mechanism. We show that Eyettention outperforms state-of-the-art models in
predicting scanpaths. We provide an extensive within- and across-data set
evaluation on different languages. An ablation study and qualitative analysis
support an in-depth understanding of the model's behavior.
Related papers
- Look Hear: Gaze Prediction for Speech-directed Human Attention [49.81718760025951]
Our study focuses on the incremental prediction of attention as a person is seeing an image and hearing a referring expression.
We developed the Attention in Referral Transformer model or ART, which predicts the human fixations spurred by each word in a referring expression.
In our quantitative and qualitative analyses, ART not only outperforms existing methods in scanpath prediction, but also appears to capture several human attention patterns.
arXiv Detail & Related papers (2024-07-28T22:35:08Z) - Pixel Sentence Representation Learning [67.4775296225521]
In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process.
We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous.
Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
arXiv Detail & Related papers (2024-02-13T02:46:45Z) - Subspace Chronicles: How Linguistic Information Emerges, Shifts and
Interacts during Language Model Training [56.74440457571821]
We analyze tasks covering syntax, semantics and reasoning, across 2M pre-training steps and five seeds.
We identify critical learning phases across tasks and time, during which subspaces emerge, share information, and later disentangle to specialize.
Our findings have implications for model interpretability, multi-task learning, and learning from limited data.
arXiv Detail & Related papers (2023-10-25T09:09:55Z) - ScanDL: A Diffusion Model for Generating Synthetic Scanpaths on Texts [0.5520145204626482]
Eye movements in reading play a crucial role in psycholinguistic research.
The scarcity of eye movement data and its unavailability at application time poses a major challenge for this line of research.
We propose ScanDL, a novel discrete sequence-to-sequence diffusion model that generates synthetic scanpaths on texts.
arXiv Detail & Related papers (2023-10-24T07:52:19Z) - Visual Storytelling with Question-Answer Plans [70.89011289754863]
We present a novel framework which integrates visual representations with pretrained language models and planning.
Our model translates the image sequence into a visual prefix, a sequence of continuous embeddings which language models can interpret.
It also leverages a sequence of question-answer pairs as a blueprint plan for selecting salient visual concepts and determining how they should be assembled into a narrative.
arXiv Detail & Related papers (2023-10-08T21:45:34Z) - A Linguistic Investigation of Machine Learning based Contradiction
Detection Models: An Empirical Analysis and Future Perspectives [0.34998703934432673]
We analyze two Natural Language Inference data sets with respect to their linguistic features.
The goal is to identify those syntactic and semantic properties that are particularly hard to comprehend for a machine learning model.
arXiv Detail & Related papers (2022-10-19T10:06:03Z) - Eye-tracking based classification of Mandarin Chinese readers with and
without dyslexia using neural sequence models [7.639036130018945]
We propose two simple sequence models that process eye movements on the entire stimulus without the need of aggregating features across the sentence.
We incorporate the linguistic stimulus into the model in two ways -- contextualized word embeddings and manually extracted linguistic features.
Our results show that (i) even for a logographic script such as Chinese, sequence models are able to classify dyslexia on eye gaze sequences, reaching state-of-the-art performance.
arXiv Detail & Related papers (2022-10-18T12:57:30Z) - Using Human Psychophysics to Evaluate Generalization in Scene Text
Recognition Models [7.294729862905325]
We characterize two important scene text recognition models by measuring their domains.
The domains specifies the ability of readers to generalize to different word lengths, fonts, and amounts of occlusion.
arXiv Detail & Related papers (2020-06-30T19:51:26Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z) - Pay Attention to What You Read: Non-recurrent Handwritten Text-Line
Recognition [4.301658883577544]
We introduce a non-recurrent approach to recognize handwritten text by the use of transformer models.
We are able to tackle character recognition as well as to learn language-related dependencies of the character sequences to be decoded.
arXiv Detail & Related papers (2020-05-26T21:15:20Z) - Temporal Embeddings and Transformer Models for Narrative Text
Understanding [72.88083067388155]
We present two approaches to narrative text understanding for character relationship modelling.
The temporal evolution of these relations is described by dynamic word embeddings, that are designed to learn semantic changes over time.
A supervised learning approach based on the state-of-the-art transformer model BERT is used instead to detect static relations between characters.
arXiv Detail & Related papers (2020-03-19T14:23:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.