Related papers: Follow-on Question Suggestion via Voice Hints for Voice Assistants

Follow-on Question Suggestion via Voice Hints for Voice Assistants

URL: http://arxiv.org/abs/2310.17034v1
Date: Wed, 25 Oct 2023 22:22:18 GMT
Title: Follow-on Question Suggestion via Voice Hints for Voice Assistants
Authors: Besnik Fetahu, Pedro Faustini, Giuseppe Castellucci, Anjie Fang, Oleg Rokhlenko, Shervin Malmasi
Abstract summary: We tackle the novel task of suggesting questions with compact and natural voice hints to allow users to ask follow-up questions. We propose baselines and an approach using sequence-to-sequence Transformers to generate spoken hints from a list of questions. Results show that a naive approach of concatenating suggested questions creates poor voice hints.
Score: 29.531005346608215
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The adoption of voice assistants like Alexa or Siri has grown rapidly, allowing users to instantly access information via voice search. Query suggestion is a standard feature of screen-based search experiences, allowing users to explore additional topics. However, this is not trivial to implement in voice-based settings. To enable this, we tackle the novel task of suggesting questions with compact and natural voice hints to allow users to ask follow-up questions. We define the task, ground it in syntactic theory and outline linguistic desiderata for spoken hints. We propose baselines and an approach using sequence-to-sequence Transformers to generate spoken hints from a list of questions. Using a new dataset of 6681 input questions and human written hints, we evaluated the models with automatic metrics and human evaluation. Results show that a naive approach of concatenating suggested questions creates poor voice hints. Our approach, which applies a linguistically-motivated pretraining task was strongly preferred by humans for producing the most natural hints.

Related papers

WikiHint: A Human-Annotated Dataset for Hint Ranking and Generation [15.144785147549713]
We first introduce a manually constructed hint dataset, WikiHint, which is based on Wikipedia and includes 5,000 hints created for 1,000 questions. We assess the effectiveness of the hints with human participants who answer questions with and without the aid of hints. Our findings show that (a) the dataset helps generate more effective hints, (b) including answer information along with questions generally improves the quality of generated hints, and (c) encoder-based models perform better than decoder-based models in hint ranking.
arXiv Detail & Related papers (2024-12-02T15:44:19Z)
Distilling an End-to-End Voice Assistant Without Instruction Training Data [53.524071162124464]
Distilled Voice Assistant (DiVA) generalizes to Question Answering, Classification, and Translation. We show that DiVA better meets user preferences, achieving a 72% win rate compared with state-of-the-art models like Qwen 2 Audio.
arXiv Detail & Related papers (2024-10-03T17:04:48Z)
Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System [73.34663391495616]
We propose a pioneering approach to tackle joint multi-talker and target-talker speech recognition tasks. Specifically, we freeze Whisper and plug a Sidecar separator into its encoder to separate mixed embedding for multiple talkers. We deliver acceptable zero-shot performance on multi-talker ASR on AishellMix Mandarin dataset.
arXiv Detail & Related papers (2024-07-13T09:28:24Z)
Can Language Models Learn to Listen? [96.01685069483025]
We present a framework for generating appropriate facial responses from a listener in dyadic social interactions based on the speaker's words. Our approach autoregressively predicts a response of a listener: a sequence of listener facial gestures, quantized using a VQ-VAE. We show that our generated listener motion is fluent and reflective of language semantics through quantitative metrics and a qualitative user study.
arXiv Detail & Related papers (2023-08-21T17:59:02Z)
Rewriting the Script: Adapting Text Instructions for Voice Interaction [39.54213483588498]
We study the limitations of the dominant approach voice assistants take to complex task guidance. We propose eight ways in which voice assistants can transform written sources into forms that are readily communicated through spoken conversation.
arXiv Detail & Related papers (2023-06-16T17:43:00Z)
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization [61.60501633397704]
We investigate the emergent abilities of the recently proposed web-scale speech model Whisper, by adapting it to unseen tasks with prompt engineering. We design task-specific prompts, by either leveraging another large-scale model, or simply manipulating the special tokens in the default prompts. Experiments show that our proposed prompts improve performance by 10% to 45% on the three zero-shot tasks, and even outperform SotA supervised models on some datasets.
arXiv Detail & Related papers (2023-05-18T16:32:58Z)
End-to-end Spoken Conversational Question Answering: Task, Dataset and Model [92.18621726802726]
In spoken question answering, the systems are designed to answer questions from contiguous text spans within the related speech transcripts. We propose a new Spoken Conversational Question Answering task (SCQA), aiming at enabling the systems to model complex dialogue flows. Our main objective is to build the system to deal with conversational questions based on the audio recordings, and to explore the plausibility of providing more cues from different modalities with systems in information gathering.
arXiv Detail & Related papers (2022-04-29T17:56:59Z)
Evaluating Mixed-initiative Conversational Search Systems via User Simulation [9.066817876491053]
We propose a conversational User Simulator, called USi, for automatic evaluation of such search systems. We show that responses generated by USi are both inline with the underlying information need and comparable to human-generated answers.
arXiv Detail & Related papers (2022-04-17T16:27:33Z)
Soliciting User Preferences in Conversational Recommender Systems via Usage-related Questions [21.184555512370093]
We propose a novel approach to preference elicitation by asking implicit questions based on item usage. First, we identify the sentences from a large review corpus that contain information about item usage. Then, we generate implicit preference elicitation questions from those sentences using a neural text-to-text model.
arXiv Detail & Related papers (2021-11-26T12:23:14Z)
Using Voice and Biofeedback to Predict User Engagement during Requirements Interviews [11.277063517143565]
We propose to utilize biometric data, in terms of physiological and voice features, to complement interviews with information about user engagement. We evaluate our approach by interviewing users while gathering their physiological data using an Empatica E4 wristband. Our results show that we can predict users' engagement by training supervised machine learning algorithms on biometric data.
arXiv Detail & Related papers (2021-04-06T10:34:36Z)
Towards Data Distillation for End-to-end Spoken Conversational Question Answering [65.124088336738]
We propose a new Spoken Conversational Question Answering task (SCQA) SCQA aims at enabling QA systems to model complex dialogues flow given the speech utterances and text corpora. Our main objective is to build a QA system to deal with conversational questions both in spoken and text forms.
arXiv Detail & Related papers (2020-10-18T05:53:39Z)
Learning to Rank Intents in Voice Assistants [2.102846336724103]
We propose a novel Energy-based model for the intent ranking task. We show our approach outperforms existing state of the art methods by reducing the error-rate by 3.8%. We also evaluate the robustness of our algorithm on the intent ranking task and show our algorithm improves the robustness by 33.3%.
arXiv Detail & Related papers (2020-04-30T21:51:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.