DUAL: Textless Spoken Question Answering with Speech Discrete Unit
Adaptive Learning
- URL: http://arxiv.org/abs/2203.04911v1
- Date: Wed, 9 Mar 2022 17:46:22 GMT
- Title: DUAL: Textless Spoken Question Answering with Speech Discrete Unit
Adaptive Learning
- Authors: Guan-Ting Lin, Yung-Sung Chuang, Ho-Lam Chung, Shu-wen Yang, Hsuan-Jui
Chen, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Lin-shan Lee
- Abstract summary: Spoken Question Answering (SQA) has gained research attention and made remarkable progress in recent years.
Existing SQA methods rely on Automatic Speech Recognition (ASR) transcripts, which are time and cost-prohibitive to collect.
This work proposes an ASR transcript-free SQA framework named Discrete Unit Adaptive Learning (DUAL), which leverages unlabeled data for pre-training and is fine-tuned by the SQA downstream task.
- Score: 66.71308154398176
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spoken Question Answering (SQA) has gained research attention and made
remarkable progress in recent years. However, existing SQA methods rely on
Automatic Speech Recognition (ASR) transcripts, which are time and
cost-prohibitive to collect. This work proposes an ASR transcript-free SQA
framework named Discrete Unit Adaptive Learning (DUAL), which leverages
unlabeled data for pre-training and is fine-tuned by the SQA downstream task.
DAUL can directly predict the time interval of the spoken answer from the
spoken document. We also release a new SQA benchmark corpus Natural
Multi-speaker Spoken Question Answering (NMSQA) for testing SQA in realistic
scenarios. The experimental results show that DUAL performs competitively with
the cascade approach (ASR + text QA), and DUAL is robust to real-world speech.
We will open-source our code and model to inspire more SQA innovations from the
community
Related papers
- SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering [76.4510005602893]
Spoken Question Answering (SQA) is essential for machines to reply to user's question by finding the answer span within a given spoken passage.
This paper proposes the first known end-to-end framework, Speech Passage Retriever (SpeechDPR)
SpeechDPR learns a sentence-level semantic representation by distilling knowledge from the cascading model of unsupervised ASR (UASR) and dense text retriever (TDR)
arXiv Detail & Related papers (2024-01-24T14:08:38Z) - GSQA: An End-to-End Model for Generative Spoken Question Answering [54.418723701886115]
We introduce the first end-to-end Generative Spoken Question Answering (GSQA) model that empowers the system to engage in abstractive reasoning.
Our model surpasses the previous extractive model by 3% on extractive QA datasets.
Our GSQA model shows the potential to generalize to a broad spectrum of questions, thus further expanding the spoken question answering capabilities of abstractive QA.
arXiv Detail & Related papers (2023-12-15T13:33:18Z) - Toward Unsupervised Realistic Visual Question Answering [70.67698100148414]
We study the problem of realistic VQA (RVQA), where a model has to reject unanswerable questions (UQs) and answer answerable ones (AQs)
We first point out 2 drawbacks in current RVQA research, where (1) datasets contain too many unchallenging UQs and (2) a large number of annotated UQs are required for training.
We propose a new testing dataset, RGQA, which combines AQs from an existing VQA dataset with around 29K human-annotated UQs.
This combines pseudo UQs obtained by randomly pairing images and questions, with an
arXiv Detail & Related papers (2023-03-09T06:58:29Z) - An Initial Investigation of Non-Native Spoken Question-Answering [36.89541375786233]
We show that a simple text-based ELECTRA MC model trained on SQuAD2.0 transfers well for spoken question answering tests.
One significant challenge is the lack of appropriately annotated speech corpora to train systems for this task.
Mismatches must be considered between text documents and spoken responses; non-native spoken grammar and written grammar.
arXiv Detail & Related papers (2021-07-09T21:59:16Z) - ASQ: Automatically Generating Question-Answer Pairs using AMRs [1.0878040851638]
We introduce ASQ, a tool to automatically mine questions and answers from a sentence, using its Abstract Meaning Representation (AMR)
A qualitative evaluation of the output generated by ASQ from the AMR 2.0 data shows that the question-answer pairs are natural and valid.
We intend to make this tool and the results publicly available for others to use and build upon.
arXiv Detail & Related papers (2021-05-20T20:38:05Z) - Contextualized Attention-based Knowledge Transfer for Spoken
Conversational Question Answering [63.72278693825945]
Spoken conversational question answering (SCQA) requires machines to model complex dialogue flow.
We propose CADNet, a novel contextualized attention-based distillation approach.
We conduct extensive experiments on the Spoken-CoQA dataset and demonstrate that our approach achieves remarkable performance.
arXiv Detail & Related papers (2020-10-21T15:17:18Z) - Towards Data Distillation for End-to-end Spoken Conversational Question
Answering [65.124088336738]
We propose a new Spoken Conversational Question Answering task (SCQA)
SCQA aims at enabling QA systems to model complex dialogues flow given the speech utterances and text corpora.
Our main objective is to build a QA system to deal with conversational questions both in spoken and text forms.
arXiv Detail & Related papers (2020-10-18T05:53:39Z) - Fluent Response Generation for Conversational Question Answering [15.826109118064716]
We propose a method for situating responses within a SEQ2SEQ NLG approach to generate fluent grammatical answer responses.
We use data augmentation to generate training data for an end-to-end system.
arXiv Detail & Related papers (2020-05-21T04:57:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.