Speaker effects in spoken language comprehension
- URL: http://arxiv.org/abs/2412.07238v1
- Date: Tue, 10 Dec 2024 07:03:06 GMT
- Title: Speaker effects in spoken language comprehension
- Authors: Hanlin Wu, Zhenguang G. Cai,
- Abstract summary: The identity of a speaker significantly influences spoken language comprehension by affecting both perception and expectation.
We propose an integrative model featuring the interplay between bottom-up perception-based processes driven by acoustic details and top-down expectation-based processes driven by a speaker model.
- Score: 0.9514940899499753
- License:
- Abstract: The identity of a speaker significantly influences spoken language comprehension by affecting both perception and expectation. This review explores speaker effects, focusing on how speaker information impacts language processing. We propose an integrative model featuring the interplay between bottom-up perception-based processes driven by acoustic details and top-down expectation-based processes driven by a speaker model. The acoustic details influence lower-level perception, while the speaker model modulates both lower-level and higher-level processes such as meaning interpretation and pragmatic inferences. We define speaker-idiosyncrasy and speaker-demographics effects and demonstrate how bottom-up and top-down processes interact at various levels in different scenarios. This framework contributes to psycholinguistic theory by offering a comprehensive account of how speaker information interacts with linguistic content to shape message construction. We suggest that speaker effects can serve as indices of a language learner's proficiency and an individual's characteristics of social cognition. We encourage future research to extend these findings to AI speakers, probing the universality of speaker effects across humans and artificial agents.
Related papers
- SIFToM: Robust Spoken Instruction Following through Theory of Mind [51.326266354164716]
We present a cognitively inspired model, Speech Instruction Following through Theory of Mind (SIFToM), to enable robots to pragmatically follow human instructions under diverse speech conditions.
Results show that the SIFToM model outperforms state-of-the-art speech and language models, approaching human-level accuracy on challenging speech instruction following tasks.
arXiv Detail & Related papers (2024-09-17T02:36:10Z) - Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization [25.213694510527436]
Most existing speaker diarization systems rely exclusively on unimodal acoustic information.
We propose a novel multimodal approach that jointly utilizes audio, visual, and semantic cues to enhance speaker diarization.
Our approach consistently outperforms state-of-the-art speaker diarization methods.
arXiv Detail & Related papers (2024-08-22T03:34:03Z) - Speaker Verification in Agent-Generated Conversations [47.6291644653831]
The recent success of large language models (LLMs) has attracted widespread interest to develop role-playing conversational agents personalized to the characteristics and styles of different speakers to enhance their abilities to perform both general and special purpose dialogue tasks.
This study introduces a novel evaluation challenge: speaker verification in agent-generated conversations, which aimed to verify whether two sets of utterances originate from the same speaker.
arXiv Detail & Related papers (2024-05-16T14:46:18Z) - Improving Speaker Diarization using Semantic Information: Joint Pairwise
Constraints Propagation [53.01238689626378]
We propose a novel approach to leverage semantic information in speaker diarization systems.
We introduce spoken language understanding modules to extract speaker-related semantic information.
We present a novel framework to integrate these constraints into the speaker diarization pipeline.
arXiv Detail & Related papers (2023-09-19T09:13:30Z) - An analysis on the effects of speaker embedding choice in non
auto-regressive TTS [4.619541348328938]
We introduce a first attempt on understanding how a non-autoregressive factorised multi-speaker speech synthesis architecture exploits the information present in different speaker embedding sets.
We show that, regardless of the used set of embeddings and learning strategy, the network can handle various speaker identities equally well.
arXiv Detail & Related papers (2023-07-19T10:57:54Z) - Speaking the Language of Your Listener: Audience-Aware Adaptation via
Plug-and-Play Theory of Mind [4.052000839878213]
We model a visually grounded referential game between a knowledgeable speaker and a listener with more limited visual and linguistic experience.
We endow our speaker with the ability to adapt its referring expressions via a simulation module that monitors the effectiveness of planned utterances from the listener's perspective.
arXiv Detail & Related papers (2023-05-31T15:17:28Z) - Exploring Speaker-Related Information in Spoken Language Understanding
for Better Speaker Diarization [7.673971221635779]
We propose methods to extract speaker-related information from semantic content in multi-party meetings.
Experiments on both AISHELL-4 and AliMeeting datasets show that our method achieves consistent improvements over acoustic-only speaker diarization systems.
arXiv Detail & Related papers (2023-05-22T11:14:19Z) - A Review of Speaker Diarization: Recent Advances with Deep Learning [78.20151731627958]
Speaker diarization is a task to label audio or video recordings with classes corresponding to speaker identity.
With the rise of deep learning technology, more rapid advancements have been made for speaker diarization.
We discuss how speaker diarization systems have been integrated with speech recognition applications.
arXiv Detail & Related papers (2021-01-24T01:28:05Z) - Filling the Gap of Utterance-aware and Speaker-aware Representation for
Multi-turn Dialogue [76.88174667929665]
A multi-turn dialogue is composed of multiple utterances from two or more different speaker roles.
In the existing retrieval-based multi-turn dialogue modeling, the pre-trained language models (PrLMs) as encoder represent the dialogues coarsely.
We propose a novel model to fill such a gap by modeling the effective utterance-aware and speaker-aware representations entailed in a dialogue history.
arXiv Detail & Related papers (2020-09-14T15:07:19Z) - Speaker Diarization with Lexical Information [59.983797884955]
This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition.
We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with speaker embeddings into a speaker clustering process to improve the overall diarization accuracy.
arXiv Detail & Related papers (2020-04-13T17:16:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.