ScanDL: A Diffusion Model for Generating Synthetic Scanpaths on Texts
- URL: http://arxiv.org/abs/2310.15587v1
- Date: Tue, 24 Oct 2023 07:52:19 GMT
- Title: ScanDL: A Diffusion Model for Generating Synthetic Scanpaths on Texts
- Authors: Lena S. Bolliger, David R. Reich, Patrick Haller, Deborah N. Jakobi,
Paul Prasse, Lena A. J\"ager
- Abstract summary: Eye movements in reading play a crucial role in psycholinguistic research.
The scarcity of eye movement data and its unavailability at application time poses a major challenge for this line of research.
We propose ScanDL, a novel discrete sequence-to-sequence diffusion model that generates synthetic scanpaths on texts.
- Score: 0.5520145204626482
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Eye movements in reading play a crucial role in psycholinguistic research
studying the cognitive mechanisms underlying human language processing. More
recently, the tight coupling between eye movements and cognition has also been
leveraged for language-related machine learning tasks such as the
interpretability, enhancement, and pre-training of language models, as well as
the inference of reader- and text-specific properties. However, scarcity of eye
movement data and its unavailability at application time poses a major
challenge for this line of research. Initially, this problem was tackled by
resorting to cognitive models for synthesizing eye movement data. However, for
the sole purpose of generating human-like scanpaths, purely data-driven
machine-learning-based methods have proven to be more suitable. Following
recent advances in adapting diffusion processes to discrete data, we propose
ScanDL, a novel discrete sequence-to-sequence diffusion model that generates
synthetic scanpaths on texts. By leveraging pre-trained word representations
and jointly embedding both the stimulus text and the fixation sequence, our
model captures multi-modal interactions between the two inputs. We evaluate
ScanDL within- and across-dataset and demonstrate that it significantly
outperforms state-of-the-art scanpath generation methods. Finally, we provide
an extensive psycholinguistic analysis that underlines the model's ability to
exhibit human-like reading behavior. Our implementation is made available at
https://github.com/DiLi-Lab/ScanDL.
Related papers
- EMTeC: A Corpus of Eye Movements on Machine-Generated Texts [2.17025619726098]
The Eye Movements on Machine-Generated Texts Corpus (EMTeC) is a naturalistic eye-movements-while-reading corpus of 107 native English speakers reading machine-generated texts.
EMTeC entails the eye movement data at all stages of pre-processing, i.e., the raw coordinate data sampled at 2000 Hz, the fixation sequences, and the reading measures.
arXiv Detail & Related papers (2024-08-08T08:00:45Z) - Text2Data: Low-Resource Data Generation with Textual Control [104.38011760992637]
Natural language serves as a common and straightforward control signal for humans to interact seamlessly with machines.
We propose Text2Data, a novel approach that utilizes unlabeled data to understand the underlying data distribution through an unsupervised diffusion model.
It undergoes controllable finetuning via a novel constraint optimization-based learning objective that ensures controllability and effectively counteracts catastrophic forgetting.
arXiv Detail & Related papers (2024-02-08T03:41:39Z) - Neural Sign Actors: A diffusion model for 3D sign language production from text [51.81647203840081]
Sign Languages (SL) serve as the primary mode of communication for the Deaf and Hard of Hearing communities.
This work makes an important step towards realistic neural sign avatars, bridging the communication gap between Deaf and hearing communities.
arXiv Detail & Related papers (2023-12-05T12:04:34Z) - Pre-Trained Language Models Augmented with Synthetic Scanpaths for
Natural Language Understanding [3.6498648388765513]
We develop a model that integrates synthetic scanpath generation with a scanpath-augmented language model.
We find that the proposed model not only outperforms the underlying language model, but achieves a performance that is comparable to a language model augmented with real human gaze data.
arXiv Detail & Related papers (2023-10-23T08:15:38Z) - Eyettention: An Attention-based Dual-Sequence Model for Predicting Human
Scanpaths during Reading [3.9766585251585282]
We develop Eyettention, the first dual-sequence model that simultaneously processes the sequence of words and the chronological sequence of fixations.
We show that Eyettention outperforms state-of-the-art models in predicting scanpaths.
arXiv Detail & Related papers (2023-04-21T07:26:49Z) - Synthesizing Human Gaze Feedback for Improved NLP Performance [20.837790838762036]
ScanTextGAN is a novel model for generating human scanpaths over text.
We show that ScanTextGAN-generated scanpaths can approximate meaningful cognitive signals in human gaze patterns.
arXiv Detail & Related papers (2023-02-11T15:34:23Z) - Pretraining on Interactions for Learning Grounded Affordance
Representations [22.290431852705662]
We train a neural network to predict objects' trajectories in a simulated interaction.
We show that our network's latent representations differentiate between both observed and unobserved affordances.
Our results suggest a way in which modern deep learning approaches to grounded language learning can be integrated with traditional formal semantic notions of lexical representations.
arXiv Detail & Related papers (2022-07-05T19:19:53Z) - Vision-Language Pre-Training for Boosting Scene Text Detectors [57.08046351495244]
We specifically adapt vision-language joint learning for scene text detection.
We propose to learn contextualized, joint representations through vision-language pre-training.
The pre-trained model is able to produce more informative representations with richer semantics.
arXiv Detail & Related papers (2022-04-29T03:53:54Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Neural Deepfake Detection with Factual Structure of Text [78.30080218908849]
We propose a graph-based model for deepfake detection of text.
Our approach represents the factual structure of a given document as an entity graph.
Our model can distinguish the difference in the factual structure between machine-generated text and human-written text.
arXiv Detail & Related papers (2020-10-15T02:35:31Z) - Temporal Embeddings and Transformer Models for Narrative Text
Understanding [72.88083067388155]
We present two approaches to narrative text understanding for character relationship modelling.
The temporal evolution of these relations is described by dynamic word embeddings, that are designed to learn semantic changes over time.
A supervised learning approach based on the state-of-the-art transformer model BERT is used instead to detect static relations between characters.
arXiv Detail & Related papers (2020-03-19T14:23:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.