Related papers: Improving Natural Language Processing Tasks with Human Gaze-Guided Neural Attention

Improving Natural Language Processing Tasks with Human Gaze-Guided Neural Attention

URL: http://arxiv.org/abs/2010.07891v2
Date: Tue, 27 Oct 2020 16:16:18 GMT
Title: Improving Natural Language Processing Tasks with Human Gaze-Guided Neural Attention
Authors: Ekta Sood, Simon Tannert, Philipp Mueller, Andreas Bulling
Abstract summary: A lack of corpora has so far limited advances in integrating human gaze data as a supervisory signal in neural attention mechanisms. We propose a novel hybrid text saliency model that combines a cognitive model of reading with explicit human gaze supervision. Our work introduces a practical approach for bridging between data-driven and cognitive models and demonstrates a new way to integrate human gaze-guided neural attention into NLP tasks.
Score: 14.940723222424749
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A lack of corpora has so far limited advances in integrating human gaze data as a supervisory signal in neural attention mechanisms for natural language processing(NLP). We propose a novel hybrid text saliency model(TSM) that, for the first time, combines a cognitive model of reading with explicit human gaze supervision in a single machine learning framework. On four different corpora we demonstrate that our hybrid TSM duration predictions are highly correlated with human gaze ground truth. We further propose a novel joint modeling approach to integrate TSM predictions into the attention layer of a network designed for a specific upstream NLP task without the need for any task-specific human gaze data. We demonstrate that our joint model outperforms the state of the art in paraphrase generation on the Quora Question Pairs corpus by more than 10% in BLEU-4 and achieves state of the art performance for sentence compression on the challenging Google Sentence Compression corpus. As such, our work introduces a practical approach for bridging between data-driven and cognitive models and demonstrates a new way to integrate human gaze-guided neural attention into NLP tasks.

Related papers

Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models [46.09562860220433]
We introduce GazeReward, a novel framework that integrates implicit feedback -- and specifically eye-tracking (ET) data -- into the Reward Model (RM) Our approach significantly improves the accuracy of the RM on established human preference datasets.
arXiv Detail & Related papers (2024-10-02T13:24:56Z)
Multimodal Fusion with LLMs for Engagement Prediction in Natural Conversation [70.52558242336988]
We focus on predicting engagement in dyadic interactions by scrutinizing verbal and non-verbal cues, aiming to detect signs of disinterest or confusion. In this work, we collect a dataset featuring 34 participants engaged in casual dyadic conversations, each providing self-reported engagement ratings at the end of each conversation. We introduce a novel fusion strategy using Large Language Models (LLMs) to integrate multiple behavior modalities into a multimodal transcript''
arXiv Detail & Related papers (2024-09-13T18:28:12Z)
GazeCLIP: Enhancing Gaze Estimation Through Text-Guided Multimodal Learning [12.706496295933343]
We propose GazeCLIP, a novel gaze estimation framework that deeply explores text-face collaboration. Specifically, we introduce a meticulously designed linguistic description generator to produce text signals enriched with coarse directional cues. Our findings underscore the potential of using visual-language collaboration to advance gaze estimation and open new avenues for future research in multimodal learning for visual tasks.
arXiv Detail & Related papers (2023-12-30T15:24:50Z)
Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks. The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation. We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z)
Expedited Training of Visual Conditioned Language Generation via Redundancy Reduction [61.16125290912494]
$textEVL_textGen$ is a framework designed for the pre-training of visually conditioned language generation models. We show that our approach accelerates the training of vision-language models by a factor of 5 without a noticeable impact on overall performance.
arXiv Detail & Related papers (2023-10-05T03:40:06Z)
Multimodality and Attention Increase Alignment in Natural Language Prediction Between Humans and Computational Models [0.8139163264824348]
Humans are known to use salient multimodal features, such as visual cues, to facilitate the processing of upcoming words. multimodal computational models can integrate visual and linguistic data using a visual attention mechanism to assign next-word probabilities. We show that predictability estimates from humans aligned more closely with scores generated from multimodal models vs. their unimodal counterparts.
arXiv Detail & Related papers (2023-08-11T09:30:07Z)
Weakly-Supervised HOI Detection from Interaction Labels Only and Language/Vision-Language Priors [36.75629570208193]
Human-object interaction (HOI) detection aims to extract interacting human-object pairs and their interaction categories from a given natural image. In this paper, we tackle HOI detection with the weakest supervision setting in the literature, using only image-level interaction labels. We first propose an approach to prune non-interacting human and object proposals to increase the quality of positive pairs within the bag, exploiting the grounding capability of the vision-language model. Second, we use a large language model to query which interactions are possible between a human and a given object category, in order to force the model not to put emphasis
arXiv Detail & Related papers (2023-03-09T19:08:02Z)
Synthesizing Human Gaze Feedback for Improved NLP Performance [20.837790838762036]
ScanTextGAN is a novel model for generating human scanpaths over text. We show that ScanTextGAN-generated scanpaths can approximate meaningful cognitive signals in human gaze patterns.
arXiv Detail & Related papers (2023-02-11T15:34:23Z)
Simulating Human Gaze with Neural Visual Attention [44.65733084492857]
We propose the Neural Visual Attention (NeVA) algorithm to integrate guidance of any downstream visual task into attention modeling. We observe that biologically constrained neural networks generate human-like scanpaths without being trained for this objective.
arXiv Detail & Related papers (2022-11-22T09:02:09Z)
TEMOS: Generating diverse human motions from textual descriptions [53.85978336198444]
We address the problem of generating diverse 3D human motions from textual descriptions. We propose TEMOS, a text-conditioned generative model leveraging variational autoencoder (VAE) training with human motion data. We show that TEMOS framework can produce both skeleton-based animations as in prior work, as well more expressive SMPL body motions.
arXiv Detail & Related papers (2022-04-25T14:53:06Z)
TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks. To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame. Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.