Addressing the Blind Spots in Spoken Language Processing
- URL: http://arxiv.org/abs/2309.06572v1
- Date: Wed, 6 Sep 2023 10:29:25 GMT
- Title: Addressing the Blind Spots in Spoken Language Processing
- Authors: Amit Moryossef
- Abstract summary: We argue that understanding human communication requires a more holistic approach that goes beyond textual or spoken words to include non-verbal elements.
We propose the development of universal automatic gesture segmentation and transcription models to transcribe these non-verbal cues into textual form.
- Score: 4.626189039960495
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper explores the critical but often overlooked role of non-verbal
cues, including co-speech gestures and facial expressions, in human
communication and their implications for Natural Language Processing (NLP). We
argue that understanding human communication requires a more holistic approach
that goes beyond textual or spoken words to include non-verbal elements.
Borrowing from advances in sign language processing, we propose the development
of universal automatic gesture segmentation and transcription models to
transcribe these non-verbal cues into textual form. Such a methodology aims to
bridge the blind spots in spoken language understanding, enhancing the scope
and applicability of NLP models. Through motivating examples, we demonstrate
the limitations of relying solely on text-based models. We propose a
computationally efficient and flexible approach for incorporating non-verbal
cues, which can seamlessly integrate with existing NLP pipelines. We conclude
by calling upon the research community to contribute to the development of
universal transcription methods and to validate their effectiveness in
capturing the complexities of real-world, multi-modal interactions.
Related papers
- Real-Time Multilingual Sign Language Processing [4.626189039960495]
Sign Language Processing (SLP) is an interdisciplinary field comprised of Natural Language Processing (NLP) and Computer Vision.
Traditional approaches have often been constrained by the use of gloss-based systems that are both language-specific and inadequate for capturing the multidimensional nature of sign language.
We propose the use of SignWiring, a universal sign language transcription notation system, to serve as an intermediary link between the visual-gestural modality of signed languages and text-based linguistic representations.
arXiv Detail & Related papers (2024-12-02T21:51:41Z) - OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation [53.7173034249361]
End-to-end GPT-based model OmniFlatten capable of effectively modeling complex behaviors inherent natural conversations with low latency.
Our approach offers a simple modeling technique and a promising research direction for developing efficient and natural end-to-end full- spoken dialogue systems.
arXiv Detail & Related papers (2024-10-23T11:58:58Z) - Can LLMs Understand the Implication of Emphasized Sentences in Dialogue? [64.72966061510375]
Emphasis is a crucial component in human communication, which indicates the speaker's intention and implication beyond pure text in dialogue.
This paper introduces Emphasized-Talk, a benchmark with emphasis-annotated dialogue samples capturing the implications of emphasis.
We evaluate various Large Language Models (LLMs), both open-source and commercial, to measure their performance in understanding emphasis.
arXiv Detail & Related papers (2024-06-16T20:41:44Z) - Interpretability of Language Models via Task Spaces [14.543168558734001]
We present an alternative approach to interpret language models (LMs)
We focus on the quality of LM processing, with a focus on their language abilities.
We construct 'linguistic task spaces' that shed light on the connections LMs draw between language phenomena.
arXiv Detail & Related papers (2024-06-10T16:34:30Z) - Interactive Natural Language Processing [67.87925315773924]
Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within the field of NLP.
This paper offers a comprehensive survey of iNLP, starting by proposing a unified definition and framework of the concept.
arXiv Detail & Related papers (2023-05-22T17:18:29Z) - Channel-aware Decoupling Network for Multi-turn Dialogue Comprehension [81.47133615169203]
We propose compositional learning for holistic interaction across utterances beyond the sequential contextualization from PrLMs.
We employ domain-adaptive training strategies to help the model adapt to the dialogue domains.
Experimental results show that our method substantially boosts the strong PrLM baselines in four public benchmark datasets.
arXiv Detail & Related papers (2023-01-10T13:18:25Z) - An Inclusive Notion of Text [69.36678873492373]
We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP.
We introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling.
arXiv Detail & Related papers (2022-11-10T14:26:43Z) - Color Overmodification Emerges from Data-Driven Learning and Pragmatic
Reasoning [53.088796874029974]
We show that speakers' referential expressions depart from communicative ideals in ways that help illuminate the nature of pragmatic language use.
By adopting neural networks as learning agents, we show that overmodification is more likely with environmental features that are infrequent or salient.
arXiv Detail & Related papers (2022-05-18T18:42:43Z) - Bridging between Cognitive Processing Signals and Linguistic Features
via a Unified Attentional Network [25.235060468310696]
We propose a data-driven method to investigate the relationship between cognitive processing signals and linguistic features.
We present a unified attentional framework that is composed of embedding, attention, encoding and predicting layers.
The proposed framework can be used to detect a wide range of linguistic features with a single cognitive dataset.
arXiv Detail & Related papers (2021-12-16T12:25:11Z) - Towards Transparent Interactive Semantic Parsing via Step-by-Step
Correction [17.000283696243564]
We investigate an interactive semantic parsing framework that explains the predicted logical form step by step in natural language.
We focus on question answering over knowledge bases (KBQA) as an instantiation of our framework.
Our experiments show that the interactive framework with human feedback has the potential to greatly improve overall parse accuracy.
arXiv Detail & Related papers (2021-10-15T20:11:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.