Improving Natural Language Processing Tasks with Human Gaze-Guided
Neural Attention
- URL: http://arxiv.org/abs/2010.07891v2
- Date: Tue, 27 Oct 2020 16:16:18 GMT
- Title: Improving Natural Language Processing Tasks with Human Gaze-Guided
Neural Attention
- Authors: Ekta Sood, Simon Tannert, Philipp Mueller, Andreas Bulling
- Abstract summary: A lack of corpora has so far limited advances in integrating human gaze data as a supervisory signal in neural attention mechanisms.
We propose a novel hybrid text saliency model that combines a cognitive model of reading with explicit human gaze supervision.
Our work introduces a practical approach for bridging between data-driven and cognitive models and demonstrates a new way to integrate human gaze-guided neural attention into NLP tasks.
- Score: 14.940723222424749
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A lack of corpora has so far limited advances in integrating human gaze data
as a supervisory signal in neural attention mechanisms for natural language
processing(NLP). We propose a novel hybrid text saliency model(TSM) that, for
the first time, combines a cognitive model of reading with explicit human gaze
supervision in a single machine learning framework. On four different corpora
we demonstrate that our hybrid TSM duration predictions are highly correlated
with human gaze ground truth. We further propose a novel joint modeling
approach to integrate TSM predictions into the attention layer of a network
designed for a specific upstream NLP task without the need for any
task-specific human gaze data. We demonstrate that our joint model outperforms
the state of the art in paraphrase generation on the Quora Question Pairs
corpus by more than 10% in BLEU-4 and achieves state of the art performance for
sentence compression on the challenging Google Sentence Compression corpus. As
such, our work introduces a practical approach for bridging between data-driven
and cognitive models and demonstrates a new way to integrate human gaze-guided
neural attention into NLP tasks.
Related papers
- Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models [46.09562860220433]
We introduce GazeReward, a novel framework that integrates implicit feedback -- and specifically eye-tracking (ET) data -- into the Reward Model (RM)
Our approach significantly improves the accuracy of the RM on established human preference datasets.
arXiv Detail & Related papers (2024-10-02T13:24:56Z) - Multimodal Fusion with LLMs for Engagement Prediction in Natural Conversation [70.52558242336988]
We focus on predicting engagement in dyadic interactions by scrutinizing verbal and non-verbal cues, aiming to detect signs of disinterest or confusion.
In this work, we collect a dataset featuring 34 participants engaged in casual dyadic conversations, each providing self-reported engagement ratings at the end of each conversation.
We introduce a novel fusion strategy using Large Language Models (LLMs) to integrate multiple behavior modalities into a multimodal transcript''
arXiv Detail & Related papers (2024-09-13T18:28:12Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - Expedited Training of Visual Conditioned Language Generation via
Redundancy Reduction [61.16125290912494]
$textEVL_textGen$ is a framework designed for the pre-training of visually conditioned language generation models.
We show that our approach accelerates the training of vision-language models by a factor of 5 without a noticeable impact on overall performance.
arXiv Detail & Related papers (2023-10-05T03:40:06Z) - Multimodality and Attention Increase Alignment in Natural Language
Prediction Between Humans and Computational Models [0.8139163264824348]
Humans are known to use salient multimodal features, such as visual cues, to facilitate the processing of upcoming words.
multimodal computational models can integrate visual and linguistic data using a visual attention mechanism to assign next-word probabilities.
We show that predictability estimates from humans aligned more closely with scores generated from multimodal models vs. their unimodal counterparts.
arXiv Detail & Related papers (2023-08-11T09:30:07Z) - Weakly-Supervised HOI Detection from Interaction Labels Only and
Language/Vision-Language Priors [36.75629570208193]
Human-object interaction (HOI) detection aims to extract interacting human-object pairs and their interaction categories from a given natural image.
In this paper, we tackle HOI detection with the weakest supervision setting in the literature, using only image-level interaction labels.
We first propose an approach to prune non-interacting human and object proposals to increase the quality of positive pairs within the bag, exploiting the grounding capability of the vision-language model.
Second, we use a large language model to query which interactions are possible between a human and a given object category, in order to force the model not to put emphasis
arXiv Detail & Related papers (2023-03-09T19:08:02Z) - Synthesizing Human Gaze Feedback for Improved NLP Performance [20.837790838762036]
ScanTextGAN is a novel model for generating human scanpaths over text.
We show that ScanTextGAN-generated scanpaths can approximate meaningful cognitive signals in human gaze patterns.
arXiv Detail & Related papers (2023-02-11T15:34:23Z) - Simulating Human Gaze with Neural Visual Attention [44.65733084492857]
We propose the Neural Visual Attention (NeVA) algorithm to integrate guidance of any downstream visual task into attention modeling.
We observe that biologically constrained neural networks generate human-like scanpaths without being trained for this objective.
arXiv Detail & Related papers (2022-11-22T09:02:09Z) - TEMOS: Generating diverse human motions from textual descriptions [53.85978336198444]
We address the problem of generating diverse 3D human motions from textual descriptions.
We propose TEMOS, a text-conditioned generative model leveraging variational autoencoder (VAE) training with human motion data.
We show that TEMOS framework can produce both skeleton-based animations as in prior work, as well more expressive SMPL body motions.
arXiv Detail & Related papers (2022-04-25T14:53:06Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.