Related papers: Learning to Mediate Disparities Towards Pragmatic Communication

Learning to Mediate Disparities Towards Pragmatic Communication

URL: http://arxiv.org/abs/2203.13685v1
Date: Fri, 25 Mar 2022 14:46:43 GMT
Title: Learning to Mediate Disparities Towards Pragmatic Communication
Authors: Yuwei Bao, Sayan Ghosh, Joyce Chai
Abstract summary: We propose Pragmatic Rational Speaker (PRS) as a framework for building AI agents with similar abilities in language communication. The PRS attempts to learn the speaker-listener disparity and adjust the speech accordingly, by adding a light-weighted disparity adjustment layer into working memory. By fixing the long-term memory, the PRS only needs to update its working memory to learn and adapt to different types of listeners.
Score: 9.321336642983875
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Human communication is a collaborative process. Speakers, on top of conveying their own intent, adjust the content and language expressions by taking the listeners into account, including their knowledge background, personalities, and physical capabilities. Towards building AI agents with similar abilities in language communication, we propose Pragmatic Rational Speaker (PRS), a framework extending Rational Speech Act (RSA). The PRS attempts to learn the speaker-listener disparity and adjust the speech accordingly, by adding a light-weighted disparity adjustment layer into working memory on top of speaker's long-term memory system. By fixing the long-term memory, the PRS only needs to update its working memory to learn and adapt to different types of listeners. To validate our framework, we create a dataset that simulates different types of speaker-listener disparities in the context of referential games. Our empirical results demonstrate that the PRS is able to shift its output towards the language that listener are able to understand, significantly improve the collaborative task outcome.

Related papers

Speaker effects in spoken language comprehension [0.9514940899499753]
The identity of a speaker significantly influences spoken language comprehension by affecting both perception and expectation. We propose an integrative model featuring the interplay between bottom-up perception-based processes driven by acoustic details and top-down expectation-based processes driven by a speaker model.
arXiv Detail & Related papers (2024-12-10T07:03:06Z)
Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation [53.01238689626378]
We propose a novel approach to leverage semantic information in speaker diarization systems. We introduce spoken language understanding modules to extract speaker-related semantic information. We present a novel framework to integrate these constraints into the speaker diarization pipeline.
arXiv Detail & Related papers (2023-09-19T09:13:30Z)
Effect of Attention and Self-Supervised Speech Embeddings on Non-Semantic Speech Tasks [3.570593982494095]
We look at speech emotion understanding as a perception task which is a more realistic setting. We leverage ComParE rich dataset of multilingual speakers and multi-label regression target of 'emotion share' or perception of that emotion. Our results show that HuBERT-Large with a self-attention-based light-weight sequence model provides 4.6% improvement over the reported baseline.
arXiv Detail & Related papers (2023-08-28T07:11:27Z)
Visual-Aware Text-to-Speech [101.89332968344102]
We present a new visual-aware text-to-speech (VA-TTS) task to synthesize speech conditioned on both textual inputs and visual feedback of the listener in face-to-face communication. We devise a baseline model to fuse phoneme linguistic information and listener visual signals for speech synthesis.
arXiv Detail & Related papers (2023-06-21T05:11:39Z)
Speaking the Language of Your Listener: Audience-Aware Adaptation via Plug-and-Play Theory of Mind [4.052000839878213]
We model a visually grounded referential game between a knowledgeable speaker and a listener with more limited visual and linguistic experience. We endow our speaker with the ability to adapt its referring expressions via a simulation module that monitors the effectiveness of planned utterances from the listener's perspective.
arXiv Detail & Related papers (2023-05-31T15:17:28Z)
ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech Representations [12.20522794248598]
We propose a zero-shot voice conversion method using speech representations trained with self-supervised learning. We develop a multi-task model to decompose a speech utterance into features such as linguistic content, speaker characteristics, and speaking style. Next, we develop a synthesis model with pitch and duration predictors that can effectively reconstruct the speech signal from its representation.
arXiv Detail & Related papers (2023-02-16T08:10:41Z)
Channel-aware Decoupling Network for Multi-turn Dialogue Comprehension [81.47133615169203]
We propose compositional learning for holistic interaction across utterances beyond the sequential contextualization from PrLMs. We employ domain-adaptive training strategies to help the model adapt to the dialogue domains. Experimental results show that our method substantially boosts the strong PrLM baselines in four public benchmark datasets.
arXiv Detail & Related papers (2023-01-10T13:18:25Z)
Know your audience: specializing grounded language models with listener subtraction [20.857795779760917]
We take inspiration from Dixit to formulate a multi-agent image reference game. We show that finetuning an attention-based adapter between a CLIP vision encoder and a large language model in this contrastive, multi-agent setting gives rise to context-dependent natural language specialization.
arXiv Detail & Related papers (2022-06-16T17:52:08Z)
Curriculum Learning for Goal-Oriented Semantic Communications with a Common Language [60.85719227557608]
A holistic goal-oriented semantic communication framework is proposed to enable a speaker and a listener to cooperatively execute a set of sequential tasks. A common language based on a hierarchical belief set is proposed to enable semantic communications between speaker and listener. An optimization problem is defined to determine the perfect and abstract description of the events.
arXiv Detail & Related papers (2022-04-21T22:36:06Z)
VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion [54.29557210925752]
One-shot voice conversion can be effectively achieved by speech representation disentanglement. We employ vector quantization (VQ) for content encoding and introduce mutual information (MI) as the correlation metric during training. Experimental results reflect the superiority of the proposed method in learning effective disentangled speech representations.
arXiv Detail & Related papers (2021-06-18T13:50:38Z)
Self-play for Data Efficient Language Acquisition [20.86261546611472]
We exploit the symmetric nature of communication in order to improve the efficiency and quality of language acquisition in learning agents. We show that using self-play as a substitute for direct supervision enables the agent to transfer its knowledge across roles.
arXiv Detail & Related papers (2020-10-10T02:09:19Z)
SPLAT: Speech-Language Joint Pre-Training for Spoken Language Understanding [61.02342238771685]
Spoken language understanding requires a model to analyze input acoustic signal to understand its linguistic content and make predictions. Various pre-training methods have been proposed to learn rich representations from large-scale unannotated speech and text. We propose a novel semi-supervised learning framework, SPLAT, to jointly pre-train the speech and language modules.
arXiv Detail & Related papers (2020-10-05T19:29:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.