Improving Natural Language Processing Tasks with Human Gaze-Guided
Neural Attention
- URL: http://arxiv.org/abs/2010.07891v2
- Date: Tue, 27 Oct 2020 16:16:18 GMT
- Title: Improving Natural Language Processing Tasks with Human Gaze-Guided
Neural Attention
- Authors: Ekta Sood, Simon Tannert, Philipp Mueller, Andreas Bulling
- Abstract summary: A lack of corpora has so far limited advances in integrating human gaze data as a supervisory signal in neural attention mechanisms.
We propose a novel hybrid text saliency model that combines a cognitive model of reading with explicit human gaze supervision.
Our work introduces a practical approach for bridging between data-driven and cognitive models and demonstrates a new way to integrate human gaze-guided neural attention into NLP tasks.
- Score: 14.940723222424749
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A lack of corpora has so far limited advances in integrating human gaze data
as a supervisory signal in neural attention mechanisms for natural language
processing(NLP). We propose a novel hybrid text saliency model(TSM) that, for
the first time, combines a cognitive model of reading with explicit human gaze
supervision in a single machine learning framework. On four different corpora
we demonstrate that our hybrid TSM duration predictions are highly correlated
with human gaze ground truth. We further propose a novel joint modeling
approach to integrate TSM predictions into the attention layer of a network
designed for a specific upstream NLP task without the need for any
task-specific human gaze data. We demonstrate that our joint model outperforms
the state of the art in paraphrase generation on the Quora Question Pairs
corpus by more than 10% in BLEU-4 and achieves state of the art performance for
sentence compression on the challenging Google Sentence Compression corpus. As
such, our work introduces a practical approach for bridging between data-driven
and cognitive models and demonstrates a new way to integrate human gaze-guided
neural attention into NLP tasks.
Related papers
- Generating Human-Centric Visual Cues for Human-Object Interaction
Detection via Large Vision-Language Models [59.611697856666304]
Human-object interaction (HOI) detection aims at detecting human-object pairs and predicting their interactions.
We propose three prompts with VLM to generate human-centric visual cues within an image from multiple perspectives of humans.
We develop a transformer-based multimodal fusion module with multitower architecture to integrate visual cue features into the instance and interaction decoders.
arXiv Detail & Related papers (2023-11-26T09:11:32Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - Expedited Training of Visual Conditioned Language Generation via
Redundancy Reduction [61.16125290912494]
$textEVL_textGen$ is a framework designed for the pre-training of visually conditioned language generation models.
We show that our approach accelerates the training of vision-language models by a factor of 5 without a noticeable impact on overall performance.
arXiv Detail & Related papers (2023-10-05T03:40:06Z) - Multimodality and Attention Increase Alignment in Natural Language
Prediction Between Humans and Computational Models [0.8139163264824348]
Humans are known to use salient multimodal features, such as visual cues, to facilitate the processing of upcoming words.
multimodal computational models can integrate visual and linguistic data using a visual attention mechanism to assign next-word probabilities.
We show that predictability estimates from humans aligned more closely with scores generated from multimodal models vs. their unimodal counterparts.
arXiv Detail & Related papers (2023-08-11T09:30:07Z) - Weakly-Supervised HOI Detection from Interaction Labels Only and
Language/Vision-Language Priors [36.75629570208193]
Human-object interaction (HOI) detection aims to extract interacting human-object pairs and their interaction categories from a given natural image.
In this paper, we tackle HOI detection with the weakest supervision setting in the literature, using only image-level interaction labels.
We first propose an approach to prune non-interacting human and object proposals to increase the quality of positive pairs within the bag, exploiting the grounding capability of the vision-language model.
Second, we use a large language model to query which interactions are possible between a human and a given object category, in order to force the model not to put emphasis
arXiv Detail & Related papers (2023-03-09T19:08:02Z) - Synthesizing Human Gaze Feedback for Improved NLP Performance [20.837790838762036]
ScanTextGAN is a novel model for generating human scanpaths over text.
We show that ScanTextGAN-generated scanpaths can approximate meaningful cognitive signals in human gaze patterns.
arXiv Detail & Related papers (2023-02-11T15:34:23Z) - Simulating Human Gaze with Neural Visual Attention [44.65733084492857]
We propose the Neural Visual Attention (NeVA) algorithm to integrate guidance of any downstream visual task into attention modeling.
We observe that biologically constrained neural networks generate human-like scanpaths without being trained for this objective.
arXiv Detail & Related papers (2022-11-22T09:02:09Z) - TEMOS: Generating diverse human motions from textual descriptions [53.85978336198444]
We address the problem of generating diverse 3D human motions from textual descriptions.
We propose TEMOS, a text-conditioned generative model leveraging variational autoencoder (VAE) training with human motion data.
We show that TEMOS framework can produce both skeleton-based animations as in prior work, as well more expressive SMPL body motions.
arXiv Detail & Related papers (2022-04-25T14:53:06Z) - Leveraging recent advances in Pre-Trained Language Models
forEye-Tracking Prediction [0.0]
Natural Language Pro-cessing uses human-derived behavioral data like eye-tracking data to augment the neural nets to solve arange of tasks spanning syntax and semantics.
In this paper,we use the ZuCo 1.0 and ZuCo 2.0 dataset to explore differ-ent linguistic models to directly predict thesegaze features for each word with respect to itssentence.
arXiv Detail & Related papers (2021-10-09T06:46:48Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.