SIG: Speaker Identification in Literature via Prompt-Based Generation
- URL: http://arxiv.org/abs/2312.14590v2
- Date: Mon, 19 Feb 2024 09:25:44 GMT
- Title: SIG: Speaker Identification in Literature via Prompt-Based Generation
- Authors: Zhenlin Su, Liyan Xu, Jin Xu, Jiangnan Li, Mingdu Huangfu
- Abstract summary: We propose a generation-based method that verbalizes the task and quotation input based on designed prompt templates.
The prediction can either come from direct generation by the model, or be determined by the highest generation probability of each speaker candidate.
We perform both cross-domain evaluation and in-domain evaluation on PDNC, the largest dataset of this task.
- Score: 13.042070464592374
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Identifying speakers of quotations in narratives is an important task in
literary analysis, with challenging scenarios including the out-of-domain
inference for unseen speakers, and non-explicit cases where there are no
speaker mentions in surrounding context. In this work, we propose a simple and
effective approach SIG, a generation-based method that verbalizes the task and
quotation input based on designed prompt templates, which also enables easy
integration of other auxiliary tasks that further bolster the speaker
identification performance. The prediction can either come from direct
generation by the model, or be determined by the highest generation probability
of each speaker candidate. Based on our approach design, SIG supports
out-of-domain evaluation, and achieves open-world classification paradigm that
is able to accept any forms of candidate input. We perform both cross-domain
evaluation and in-domain evaluation on PDNC, the largest dataset of this task,
where empirical results suggest that SIG outperforms previous baselines of
complicated designs, as well as the zero-shot ChatGPT, especially excelling at
those hard non-explicit scenarios by up to 17% improvement. Additional
experiments on another dataset WP further corroborate the efficacy of SIG.
Related papers
- A Large-Scale Evaluation of Speech Foundation Models [110.95827399522204]
We establish the Speech processing Universal PERformance Benchmark (SUPERB) to study the effectiveness of the foundation model paradigm for speech.
We propose a unified multi-tasking framework to address speech processing tasks in SUPERB using a frozen foundation model followed by task-specialized, lightweight prediction heads.
arXiv Detail & Related papers (2024-04-15T00:03:16Z) - Typing to Listen at the Cocktail Party: Text-Guided Target Speaker
Extraction [39.985710814952625]
This study investigates the integration of natural language description to enhance the feasibility, controllability, and performance of existing target speaker extraction models.
We propose a model named LLM-TSE, wherein a large language model (LLM) extracts useful semantic cues from the user's typed text input.
Our experimental results demonstrate competitive performance when only text-based cues are presented, the effectiveness of using input text as a task selector, and a new state-of-the-art when combining text-based cues with pre-registered cues.
arXiv Detail & Related papers (2023-10-11T08:17:54Z) - GRASS: Unified Generation Model for Speech-to-Semantic Tasks [7.044414457214718]
We introduce a unified end-to-end (E2E) framework that generates target text conditioned on a task-related prompt for audio data.
Our proposed model achieves state-of-the-art (SOTA) results on many benchmarks covering speech named entity recognition, speech sentiment analysis, speech question answering, and more.
To facilitate future work on instruction fine-tuning for speech-to-semantic tasks, we release our instruction dataset and code.
arXiv Detail & Related papers (2023-09-06T06:44:26Z) - SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
Tasks [88.4408774253634]
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community.
There are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers.
Recent work has begun to introduce such benchmark for several tasks.
arXiv Detail & Related papers (2022-12-20T18:39:59Z) - Conversational Semantic Role Labeling with Predicate-Oriented Latent
Graph [40.43625257213158]
We propose to automatically induce a predicate-oriented latent graph (POLar) with a predicate-centered Gaussian mechanism.
The POLar structure is then dynamically pruned and refined so as to best fit the task need.
We additionally introduce an effective dialogue-level pre-trained language model, CoDiaBERT, for better supporting multiple utterance sentences.
arXiv Detail & Related papers (2022-10-06T16:42:00Z) - Referring Expressions with Rational Speech Act Framework: A
Probabilistic Approach [2.1425861443122383]
This paper focuses on a referring expression generation (REG) task in which the aim is to pick out an object in a complex visual scene.
Several recent REG systems have used deep learning approaches to represent the speaker/listener agents.
This paper applies a combination of the probabilistic RSA framework and deep learning approaches to larger datasets involving complex visual scenes.
arXiv Detail & Related papers (2022-05-16T16:37:50Z) - SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation
on Natural Speech [44.68649535280397]
We propose a suite of benchmark tasks for Spoken Language Understanding Evaluation (SLUE)
SLUE consists of limited-size labeled training sets and corresponding evaluation sets.
We present the first phase of the SLUE benchmark suite, consisting of named entity recognition, sentiment analysis, and ASR on the corresponding datasets.
We provide new transcriptions and annotations on subsets of the VoxCeleb and VoxPopuli datasets, evaluation metrics and results for baseline models, and an open-source toolkit to reproduce the baselines and evaluate new models.
arXiv Detail & Related papers (2021-11-19T18:59:23Z) - Self-supervised Text-independent Speaker Verification using Prototypical
Momentum Contrastive Learning [58.14807331265752]
We show that better speaker embeddings can be learned by momentum contrastive learning.
We generalize the self-supervised framework to a semi-supervised scenario where only a small portion of the data is labeled.
arXiv Detail & Related papers (2020-12-13T23:23:39Z) - Learning an Effective Context-Response Matching Model with
Self-Supervised Tasks for Retrieval-based Dialogues [88.73739515457116]
We introduce four self-supervised tasks including next session prediction, utterance restoration, incoherence detection and consistency discrimination.
We jointly train the PLM-based response selection model with these auxiliary tasks in a multi-task manner.
Experiment results indicate that the proposed auxiliary self-supervised tasks bring significant improvement for multi-turn response selection.
arXiv Detail & Related papers (2020-09-14T08:44:46Z) - Cross-domain Adaptation with Discrepancy Minimization for
Text-independent Forensic Speaker Verification [61.54074498090374]
This study introduces a CRSS-Forensics audio dataset collected in multiple acoustic environments.
We pre-train a CNN-based network using the VoxCeleb data, followed by an approach which fine-tunes part of the high-level network layers with clean speech from CRSS-Forensics.
arXiv Detail & Related papers (2020-09-05T02:54:33Z) - Improving Readability for Automatic Speech Recognition Transcription [50.86019112545596]
We propose a novel NLP task called ASR post-processing for readability (APR)
APR aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker.
We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method.
arXiv Detail & Related papers (2020-04-09T09:26:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.