Measuring the Impact of (Psycho-)Linguistic and Readability Features and
Their Spill Over Effects on the Prediction of Eye Movement Patterns
- URL: http://arxiv.org/abs/2203.08085v1
- Date: Tue, 15 Mar 2022 17:13:45 GMT
- Title: Measuring the Impact of (Psycho-)Linguistic and Readability Features and
Their Spill Over Effects on the Prediction of Eye Movement Patterns
- Authors: Daniel Wiechmann, Yu Qiao, Elma Kerz, Justus Mattern
- Abstract summary: We report on experiments with two eye-tracking corpora of naturalistic reading and two language models (BERT and GPT-2).
In all experiments, we test effects of a broad spectrum of features for predicting human reading behavior that fall into five categories (syntactic complexity, lexical richness, register-based multiword combinations, readability and psycholinguistic word properties).
Our experiments show that both the features included and the architecture of the transformer-based language models play a role in predicting multiple eye-tracking measures during naturalistic reading.
- Score: 27.799032561722893
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is a growing interest in the combined use of NLP and machine learning
methods to predict gaze patterns during naturalistic reading. While promising
results have been obtained through the use of transformer-based language
models, little work has been undertaken to relate the performance of such
models to general text characteristics. In this paper we report on experiments
with two eye-tracking corpora of naturalistic reading and two language models
(BERT and GPT-2). In all experiments, we test effects of a broad spectrum of
features for predicting human reading behavior that fall into five categories
(syntactic complexity, lexical richness, register-based multiword combinations,
readability and psycholinguistic word properties). Our experiments show that
both the features included and the architecture of the transformer-based
language models play a role in predicting multiple eye-tracking measures during
naturalistic reading. We also report the results of experiments aimed at
determining the relative importance of features from different groups using
SP-LIME.
Related papers
- From Text to Treatment Effects: A Meta-Learning Approach to Handling Text-Based Confounding [7.5348062792]
This paper examines the performance of meta-learners when confounding variables are expressed in text.
We show that learners using pre-trained text representations of confounders achieve improved CATE estimates.
Due to the entangled nature of the text embeddings, these models do not fully match the performance of meta-learners with perfect confounder knowledge.
arXiv Detail & Related papers (2024-09-23T19:46:19Z) - Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - When to generate hedges in peer-tutoring interactions [1.0466434989449724]
The study uses a naturalistic face-to-face dataset annotated for natural language turns, conversational strategies, tutoring strategies, and nonverbal behaviours.
Results show that embedding layers, that capture the semantic information of the previous turns, significantly improves the model's performance.
We discover that the eye gaze of both the tutor and the tutee has a significant impact on hedge prediction.
arXiv Detail & Related papers (2023-07-28T14:29:19Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - Naturalistic Causal Probing for Morpho-Syntax [76.83735391276547]
We suggest a naturalistic strategy for input-level intervention on real world data in Spanish.
Using our approach, we isolate morpho-syntactic features from counfounders in sentences.
We apply this methodology to analyze causal effects of gender and number on contextualized representations extracted from pre-trained models.
arXiv Detail & Related papers (2022-05-14T11:47:58Z) - Pushing on Personality Detection from Verbal Behavior: A Transformer
Meets Text Contours of Psycholinguistic Features [27.799032561722893]
We report two major improvements in predicting personality traits from text data.
We integrate a pre-trained Transformer Language Model BERT and Bidirectional Long Short-Term Memory networks trained on within-text distributions of psycholinguistic features.
We evaluate the performance of the models we built on two benchmark datasets.
arXiv Detail & Related papers (2022-04-10T08:08:46Z) - Leveraging recent advances in Pre-Trained Language Models
forEye-Tracking Prediction [0.0]
Natural Language Pro-cessing uses human-derived behavioral data like eye-tracking data to augment the neural nets to solve arange of tasks spanning syntax and semantics.
In this paper,we use the ZuCo 1.0 and ZuCo 2.0 dataset to explore differ-ent linguistic models to directly predict thesegaze features for each word with respect to itssentence.
arXiv Detail & Related papers (2021-10-09T06:46:48Z) - Predicting the Reproducibility of Social and Behavioral Science Papers
Using Supervised Learning Models [21.69933721765681]
We propose a framework that extracts five types of features from scholarly work that can be used to support assessments of published research claims.
We analyze pairwise correlations between individual features and their importance for predicting a set of human-assessed ground truth labels.
arXiv Detail & Related papers (2021-04-08T00:45:20Z) - Composed Variational Natural Language Generation for Few-shot Intents [118.37774762596123]
We generate training examples for few-shot intents in the realistic imbalanced scenario.
To evaluate the quality of the generated utterances, experiments are conducted on the generalized few-shot intent detection task.
Our proposed model achieves state-of-the-art performances on two real-world intent detection datasets.
arXiv Detail & Related papers (2020-09-21T17:48:43Z) - Explaining Black Box Predictions and Unveiling Data Artifacts through
Influence Functions [55.660255727031725]
Influence functions explain the decisions of a model by identifying influential training examples.
We conduct a comparison between influence functions and common word-saliency methods on representative tasks.
We develop a new measure based on influence functions that can reveal artifacts in training data.
arXiv Detail & Related papers (2020-05-14T00:45:23Z) - Linguistic Typology Features from Text: Inferring the Sparse Features of
World Atlas of Language Structures [73.06435180872293]
We construct a recurrent neural network predictor based on byte embeddings and convolutional layers.
We show that some features from various linguistic types can be predicted reliably.
arXiv Detail & Related papers (2020-04-30T21:00:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.