Follow-on Question Suggestion via Voice Hints for Voice Assistants
- URL: http://arxiv.org/abs/2310.17034v1
- Date: Wed, 25 Oct 2023 22:22:18 GMT
- Title: Follow-on Question Suggestion via Voice Hints for Voice Assistants
- Authors: Besnik Fetahu, Pedro Faustini, Giuseppe Castellucci, Anjie Fang, Oleg
Rokhlenko, Shervin Malmasi
- Abstract summary: We tackle the novel task of suggesting questions with compact and natural voice hints to allow users to ask follow-up questions.
We propose baselines and an approach using sequence-to-sequence Transformers to generate spoken hints from a list of questions.
Results show that a naive approach of concatenating suggested questions creates poor voice hints.
- Score: 29.531005346608215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The adoption of voice assistants like Alexa or Siri has grown rapidly,
allowing users to instantly access information via voice search. Query
suggestion is a standard feature of screen-based search experiences, allowing
users to explore additional topics. However, this is not trivial to implement
in voice-based settings. To enable this, we tackle the novel task of suggesting
questions with compact and natural voice hints to allow users to ask follow-up
questions.
We define the task, ground it in syntactic theory and outline linguistic
desiderata for spoken hints. We propose baselines and an approach using
sequence-to-sequence Transformers to generate spoken hints from a list of
questions. Using a new dataset of 6681 input questions and human written hints,
we evaluated the models with automatic metrics and human evaluation. Results
show that a naive approach of concatenating suggested questions creates poor
voice hints. Our approach, which applies a linguistically-motivated pretraining
task was strongly preferred by humans for producing the most natural hints.
Related papers
- Distilling an End-to-End Voice Assistant Without Instruction Training Data [53.524071162124464]
Distilled Voice Assistant (DiVA) generalizes to Question Answering, Classification, and Translation.
We show that DiVA better meets user preferences, achieving a 72% win rate compared with state-of-the-art models like Qwen 2 Audio.
arXiv Detail & Related papers (2024-10-03T17:04:48Z) - Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System [73.34663391495616]
We propose a pioneering approach to tackle joint multi-talker and target-talker speech recognition tasks.
Specifically, we freeze Whisper and plug a Sidecar separator into its encoder to separate mixed embedding for multiple talkers.
We deliver acceptable zero-shot performance on multi-talker ASR on AishellMix Mandarin dataset.
arXiv Detail & Related papers (2024-07-13T09:28:24Z) - Can Language Models Learn to Listen? [96.01685069483025]
We present a framework for generating appropriate facial responses from a listener in dyadic social interactions based on the speaker's words.
Our approach autoregressively predicts a response of a listener: a sequence of listener facial gestures, quantized using a VQ-VAE.
We show that our generated listener motion is fluent and reflective of language semantics through quantitative metrics and a qualitative user study.
arXiv Detail & Related papers (2023-08-21T17:59:02Z) - Rewriting the Script: Adapting Text Instructions for Voice Interaction [39.54213483588498]
We study the limitations of the dominant approach voice assistants take to complex task guidance.
We propose eight ways in which voice assistants can transform written sources into forms that are readily communicated through spoken conversation.
arXiv Detail & Related papers (2023-06-16T17:43:00Z) - Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot
Task Generalization [61.60501633397704]
We investigate the emergent abilities of the recently proposed web-scale speech model Whisper, by adapting it to unseen tasks with prompt engineering.
We design task-specific prompts, by either leveraging another large-scale model, or simply manipulating the special tokens in the default prompts.
Experiments show that our proposed prompts improve performance by 10% to 45% on the three zero-shot tasks, and even outperform SotA supervised models on some datasets.
arXiv Detail & Related papers (2023-05-18T16:32:58Z) - End-to-end Spoken Conversational Question Answering: Task, Dataset and
Model [92.18621726802726]
In spoken question answering, the systems are designed to answer questions from contiguous text spans within the related speech transcripts.
We propose a new Spoken Conversational Question Answering task (SCQA), aiming at enabling the systems to model complex dialogue flows.
Our main objective is to build the system to deal with conversational questions based on the audio recordings, and to explore the plausibility of providing more cues from different modalities with systems in information gathering.
arXiv Detail & Related papers (2022-04-29T17:56:59Z) - Evaluating Mixed-initiative Conversational Search Systems via User
Simulation [9.066817876491053]
We propose a conversational User Simulator, called USi, for automatic evaluation of such search systems.
We show that responses generated by USi are both inline with the underlying information need and comparable to human-generated answers.
arXiv Detail & Related papers (2022-04-17T16:27:33Z) - Soliciting User Preferences in Conversational Recommender Systems via
Usage-related Questions [21.184555512370093]
We propose a novel approach to preference elicitation by asking implicit questions based on item usage.
First, we identify the sentences from a large review corpus that contain information about item usage.
Then, we generate implicit preference elicitation questions from those sentences using a neural text-to-text model.
arXiv Detail & Related papers (2021-11-26T12:23:14Z) - Using Voice and Biofeedback to Predict User Engagement during
Requirements Interviews [11.277063517143565]
We propose to utilize biometric data, in terms of physiological and voice features, to complement interviews with information about user engagement.
We evaluate our approach by interviewing users while gathering their physiological data using an Empatica E4 wristband.
Our results show that we can predict users' engagement by training supervised machine learning algorithms on biometric data.
arXiv Detail & Related papers (2021-04-06T10:34:36Z) - Towards Data Distillation for End-to-end Spoken Conversational Question
Answering [65.124088336738]
We propose a new Spoken Conversational Question Answering task (SCQA)
SCQA aims at enabling QA systems to model complex dialogues flow given the speech utterances and text corpora.
Our main objective is to build a QA system to deal with conversational questions both in spoken and text forms.
arXiv Detail & Related papers (2020-10-18T05:53:39Z) - Learning to Rank Intents in Voice Assistants [2.102846336724103]
We propose a novel Energy-based model for the intent ranking task.
We show our approach outperforms existing state of the art methods by reducing the error-rate by 3.8%.
We also evaluate the robustness of our algorithm on the intent ranking task and show our algorithm improves the robustness by 33.3%.
arXiv Detail & Related papers (2020-04-30T21:51:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.