Related papers: Addressing the Blind Spots in Spoken Language Processing

Addressing the Blind Spots in Spoken Language Processing

URL: http://arxiv.org/abs/2309.06572v1
Date: Wed, 6 Sep 2023 10:29:25 GMT
Title: Addressing the Blind Spots in Spoken Language Processing
Authors: Amit Moryossef
Abstract summary: We argue that understanding human communication requires a more holistic approach that goes beyond textual or spoken words to include non-verbal elements. We propose the development of universal automatic gesture segmentation and transcription models to transcribe these non-verbal cues into textual form.
Score: 4.626189039960495
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper explores the critical but often overlooked role of non-verbal cues, including co-speech gestures and facial expressions, in human communication and their implications for Natural Language Processing (NLP). We argue that understanding human communication requires a more holistic approach that goes beyond textual or spoken words to include non-verbal elements. Borrowing from advances in sign language processing, we propose the development of universal automatic gesture segmentation and transcription models to transcribe these non-verbal cues into textual form. Such a methodology aims to bridge the blind spots in spoken language understanding, enhancing the scope and applicability of NLP models. Through motivating examples, we demonstrate the limitations of relying solely on text-based models. We propose a computationally efficient and flexible approach for incorporating non-verbal cues, which can seamlessly integrate with existing NLP pipelines. We conclude by calling upon the research community to contribute to the development of universal transcription methods and to validate their effectiveness in capturing the complexities of real-world, multi-modal interactions.

Related papers

ProsodyLM: Uncovering the Emerging Prosody Processing Capabilities in Speech Language Models [70.56468982313834]
We propose ProsodyLM, which introduces a simple tokenization scheme amenable to learning prosody.<n>We find that ProsodyLM can learn surprisingly diverse emerging prosody processing capabilities through pre-training alone.
arXiv Detail & Related papers (2025-07-27T00:59:01Z)
Detecting Referring Expressions in Visually Grounded Dialogue with Autoregressive Language Models [3.8673630752805446]
The aim is to investigate the extent to which the linguistic context alone can inform the detection of mentions.<n>We adapt a pretrained large language model (LLM) to perform a relatively course-grained annotation of mention spans in unfolding conversations.<n>Our findings indicate that even when using a moderately sized LLM, relatively small datasets, and parameter-efficient fine-tuning, a text-only approach can be effective.
arXiv Detail & Related papers (2025-06-26T14:14:20Z)
Towards Developmentally Plausible Rewards: Communicative Success as a Learning Signal for Interactive Language Models [49.22720751953838]
We propose a method for training language models in an interactive setting inspired by child language acquisition.<n>In our setting, a speaker attempts to communicate some information to a listener in a single-turn dialogue and receives a reward if communicative success is achieved.
arXiv Detail & Related papers (2025-05-09T11:48:36Z)
Real-Time Multilingual Sign Language Processing [4.626189039960495]
Sign Language Processing (SLP) is an interdisciplinary field comprised of Natural Language Processing (NLP) and Computer Vision. Traditional approaches have often been constrained by the use of gloss-based systems that are both language-specific and inadequate for capturing the multidimensional nature of sign language. We propose the use of SignWiring, a universal sign language transcription notation system, to serve as an intermediary link between the visual-gestural modality of signed languages and text-based linguistic representations.
arXiv Detail & Related papers (2024-12-02T21:51:41Z)
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation [53.7173034249361]
End-to-end GPT-based model OmniFlatten capable of effectively modeling complex behaviors inherent natural conversations with low latency. Our approach offers a simple modeling technique and a promising research direction for developing efficient and natural end-to-end full- spoken dialogue systems.
arXiv Detail & Related papers (2024-10-23T11:58:58Z)
Nonverbal Interaction Detection [83.40522919429337]
This work addresses a new challenge of understanding human nonverbal interaction in social contexts. We contribute a novel large-scale dataset, called NVI, which is meticulously annotated to include bounding boxes for humans and corresponding social groups. Second, we establish a new task NVI-DET for nonverbal interaction detection, which is formalized as identifying triplets in the form individual, group, interaction> from images. Third, we propose a nonverbal interaction detection hypergraph (NVI-DEHR), a new approach that explicitly models high-order nonverbal interactions using hypergraphs.
arXiv Detail & Related papers (2024-07-11T02:14:06Z)
Can LLMs Understand the Implication of Emphasized Sentences in Dialogue? [64.72966061510375]
Emphasis is a crucial component in human communication, which indicates the speaker's intention and implication beyond pure text in dialogue. This paper introduces Emphasized-Talk, a benchmark with emphasis-annotated dialogue samples capturing the implications of emphasis. We evaluate various Large Language Models (LLMs), both open-source and commercial, to measure their performance in understanding emphasis.
arXiv Detail & Related papers (2024-06-16T20:41:44Z)
Interpretability of Language Models via Task Spaces [14.543168558734001]
We present an alternative approach to interpret language models (LMs) We focus on the quality of LM processing, with a focus on their language abilities. We construct 'linguistic task spaces' that shed light on the connections LMs draw between language phenomena.
arXiv Detail & Related papers (2024-06-10T16:34:30Z)
Interactive Natural Language Processing [67.87925315773924]
Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within the field of NLP. This paper offers a comprehensive survey of iNLP, starting by proposing a unified definition and framework of the concept.
arXiv Detail & Related papers (2023-05-22T17:18:29Z)
On the Role of Emergent Communication for Social Learning in Multi-Agent Reinforcement Learning [0.0]
Social learning uses cues from experts to align heterogeneous policies, reduce sample complexity, and solve partially observable tasks. This paper proposes an unsupervised method based on the information bottleneck to capture both referential complexity and task-specific utility.
arXiv Detail & Related papers (2023-02-28T03:23:27Z)
Channel-aware Decoupling Network for Multi-turn Dialogue Comprehension [81.47133615169203]
We propose compositional learning for holistic interaction across utterances beyond the sequential contextualization from PrLMs. We employ domain-adaptive training strategies to help the model adapt to the dialogue domains. Experimental results show that our method substantially boosts the strong PrLM baselines in four public benchmark datasets.
arXiv Detail & Related papers (2023-01-10T13:18:25Z)
An Inclusive Notion of Text [69.36678873492373]
We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP. We introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling.
arXiv Detail & Related papers (2022-11-10T14:26:43Z)
Color Overmodification Emerges from Data-Driven Learning and Pragmatic Reasoning [53.088796874029974]
We show that speakers' referential expressions depart from communicative ideals in ways that help illuminate the nature of pragmatic language use. By adopting neural networks as learning agents, we show that overmodification is more likely with environmental features that are infrequent or salient.
arXiv Detail & Related papers (2022-05-18T18:42:43Z)
Bridging between Cognitive Processing Signals and Linguistic Features via a Unified Attentional Network [25.235060468310696]
We propose a data-driven method to investigate the relationship between cognitive processing signals and linguistic features. We present a unified attentional framework that is composed of embedding, attention, encoding and predicting layers. The proposed framework can be used to detect a wide range of linguistic features with a single cognitive dataset.
arXiv Detail & Related papers (2021-12-16T12:25:11Z)
Towards Transparent Interactive Semantic Parsing via Step-by-Step Correction [17.000283696243564]
We investigate an interactive semantic parsing framework that explains the predicted logical form step by step in natural language. We focus on question answering over knowledge bases (KBQA) as an instantiation of our framework. Our experiments show that the interactive framework with human feedback has the potential to greatly improve overall parse accuracy.
arXiv Detail & Related papers (2021-10-15T20:11:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.