SIG: Speaker Identification in Literature via Prompt-Based Generation
- URL: http://arxiv.org/abs/2312.14590v2
- Date: Mon, 19 Feb 2024 09:25:44 GMT
- Title: SIG: Speaker Identification in Literature via Prompt-Based Generation
- Authors: Zhenlin Su, Liyan Xu, Jin Xu, Jiangnan Li, Mingdu Huangfu
- Abstract summary: We propose a generation-based method that verbalizes the task and quotation input based on designed prompt templates.
The prediction can either come from direct generation by the model, or be determined by the highest generation probability of each speaker candidate.
We perform both cross-domain evaluation and in-domain evaluation on PDNC, the largest dataset of this task.
- Score: 13.042070464592374
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Identifying speakers of quotations in narratives is an important task in
literary analysis, with challenging scenarios including the out-of-domain
inference for unseen speakers, and non-explicit cases where there are no
speaker mentions in surrounding context. In this work, we propose a simple and
effective approach SIG, a generation-based method that verbalizes the task and
quotation input based on designed prompt templates, which also enables easy
integration of other auxiliary tasks that further bolster the speaker
identification performance. The prediction can either come from direct
generation by the model, or be determined by the highest generation probability
of each speaker candidate. Based on our approach design, SIG supports
out-of-domain evaluation, and achieves open-world classification paradigm that
is able to accept any forms of candidate input. We perform both cross-domain
evaluation and in-domain evaluation on PDNC, the largest dataset of this task,
where empirical results suggest that SIG outperforms previous baselines of
complicated designs, as well as the zero-shot ChatGPT, especially excelling at
those hard non-explicit scenarios by up to 17% improvement. Additional
experiments on another dataset WP further corroborate the efficacy of SIG.
Related papers
- Investigation of Speaker Representation for Target-Speaker Speech Processing [49.110228525976794]
This paper aims to address a fundamental question: what is the preferred speaker embedding for target-speaker speech processing tasks?
For the TS-ASR, TSE, and p-VAD tasks, we compare pre-trained speaker encoders that compute speaker embeddings from pre-recorded enrollment speech of the target speaker with ideal speaker embeddings derived directly from the target speaker's identity in the form of a one-hot vector.
Our analysis reveals speaker verification performance is somewhat unrelated to TS task performances, the one-hot vector outperforms enrollment-based ones, and the optimal embedding depends on the input mixture.
arXiv Detail & Related papers (2024-10-15T03:58:13Z) - Automated Speaking Assessment of Conversation Tests with Novel Graph-based Modeling on Spoken Response Coherence [11.217656140423207]
ASAC aims to evaluate the overall speaking proficiency of an L2 speaker in a setting where an interlocutor interacts with one or more candidates.
We propose a hierarchical graph model that aptly incorporates both broad inter-response interactions and nuanced semantic information.
Extensive experimental results on the NICT-JLE benchmark dataset suggest that our proposed modeling approach can yield considerable improvements in prediction accuracy.
arXiv Detail & Related papers (2024-09-11T07:24:07Z) - A Large-Scale Evaluation of Speech Foundation Models [110.95827399522204]
We establish the Speech processing Universal PERformance Benchmark (SUPERB) to study the effectiveness of the foundation model paradigm for speech.
We propose a unified multi-tasking framework to address speech processing tasks in SUPERB using a frozen foundation model followed by task-specialized, lightweight prediction heads.
arXiv Detail & Related papers (2024-04-15T00:03:16Z) - Typing to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction [37.27069171640074]
Humans can easily isolate a single speaker from a complex acoustic environment, a capability referred to as the "Cocktail Party Effect"
Traditional target speaker extraction approaches rely on voiceprints, which raise privacy concerns and face issues related to the quality and availability of enrollment samples.
This work introduces a novel text-guided TSE paradigm named LLM-TSE.
arXiv Detail & Related papers (2023-10-11T08:17:54Z) - SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
Tasks [88.4408774253634]
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community.
There are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers.
Recent work has begun to introduce such benchmark for several tasks.
arXiv Detail & Related papers (2022-12-20T18:39:59Z) - Conversational Semantic Role Labeling with Predicate-Oriented Latent
Graph [40.43625257213158]
We propose to automatically induce a predicate-oriented latent graph (POLar) with a predicate-centered Gaussian mechanism.
The POLar structure is then dynamically pruned and refined so as to best fit the task need.
We additionally introduce an effective dialogue-level pre-trained language model, CoDiaBERT, for better supporting multiple utterance sentences.
arXiv Detail & Related papers (2022-10-06T16:42:00Z) - Referring Expressions with Rational Speech Act Framework: A
Probabilistic Approach [2.1425861443122383]
This paper focuses on a referring expression generation (REG) task in which the aim is to pick out an object in a complex visual scene.
Several recent REG systems have used deep learning approaches to represent the speaker/listener agents.
This paper applies a combination of the probabilistic RSA framework and deep learning approaches to larger datasets involving complex visual scenes.
arXiv Detail & Related papers (2022-05-16T16:37:50Z) - SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation
on Natural Speech [44.68649535280397]
We propose a suite of benchmark tasks for Spoken Language Understanding Evaluation (SLUE)
SLUE consists of limited-size labeled training sets and corresponding evaluation sets.
We present the first phase of the SLUE benchmark suite, consisting of named entity recognition, sentiment analysis, and ASR on the corresponding datasets.
We provide new transcriptions and annotations on subsets of the VoxCeleb and VoxPopuli datasets, evaluation metrics and results for baseline models, and an open-source toolkit to reproduce the baselines and evaluate new models.
arXiv Detail & Related papers (2021-11-19T18:59:23Z) - Self-supervised Text-independent Speaker Verification using Prototypical
Momentum Contrastive Learning [58.14807331265752]
We show that better speaker embeddings can be learned by momentum contrastive learning.
We generalize the self-supervised framework to a semi-supervised scenario where only a small portion of the data is labeled.
arXiv Detail & Related papers (2020-12-13T23:23:39Z) - Learning an Effective Context-Response Matching Model with
Self-Supervised Tasks for Retrieval-based Dialogues [88.73739515457116]
We introduce four self-supervised tasks including next session prediction, utterance restoration, incoherence detection and consistency discrimination.
We jointly train the PLM-based response selection model with these auxiliary tasks in a multi-task manner.
Experiment results indicate that the proposed auxiliary self-supervised tasks bring significant improvement for multi-turn response selection.
arXiv Detail & Related papers (2020-09-14T08:44:46Z) - Improving Readability for Automatic Speech Recognition Transcription [50.86019112545596]
We propose a novel NLP task called ASR post-processing for readability (APR)
APR aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker.
We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method.
arXiv Detail & Related papers (2020-04-09T09:26:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.