TurnGPT: a Transformer-based Language Model for Predicting Turn-taking
in Spoken Dialog
- URL: http://arxiv.org/abs/2010.10874v1
- Date: Wed, 21 Oct 2020 09:58:39 GMT
- Title: TurnGPT: a Transformer-based Language Model for Predicting Turn-taking
in Spoken Dialog
- Authors: Erik Ekstedt and Gabriel Skantze
- Abstract summary: We introduce TurnGPT, a transformer-based language model for predicting turn-shifts in spoken dialog.
The model has been trained and evaluated on a variety of written and spoken dialog datasets.
- Score: 2.2716975311837357
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Syntactic and pragmatic completeness is known to be important for turn-taking
prediction, but so far machine learning models of turn-taking have used such
linguistic information in a limited way. In this paper, we introduce TurnGPT, a
transformer-based language model for predicting turn-shifts in spoken dialog.
The model has been trained and evaluated on a variety of written and spoken
dialog datasets. We show that the model outperforms two baselines used in prior
work. We also report on an ablation study, as well as attention and gradient
analyses, which show that the model is able to utilize the dialog context and
pragmatic completeness for turn-taking prediction. Finally, we explore the
model's potential in not only detecting, but also projecting, turn-completions.
Related papers
- How Language Models Prioritize Contextual Grammatical Cues? [3.9790222241649587]
We investigate how language models handle gender agreement when multiple gender cue words are present.
Our findings reveal striking differences in how encoder-based and decoder-based models prioritize and use contextual information for their predictions.
arXiv Detail & Related papers (2024-10-04T14:09:05Z) - Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner [51.77263363285369]
We present an approach called Dialogue Action Tokens that adapts language model agents to plan goal-directed dialogues.
The core idea is to treat each utterance as an action, thereby converting dialogues into games where existing approaches such as reinforcement learning can be applied.
arXiv Detail & Related papers (2024-06-17T18:01:32Z) - Multilingual Turn-taking Prediction Using Voice Activity Projection [25.094622033971643]
This paper investigates the application of voice activity projection (VAP), a predictive turn-taking model for spoken dialogue, on multilingual data.
The results show that a monolingual VAP model trained on one language does not make good predictions when applied to other languages.
A multilingual model, trained on all three languages, demonstrates predictive performance on par with monolingual models across all languages.
arXiv Detail & Related papers (2024-03-11T07:50:29Z) - Few-shot Subgoal Planning with Language Models [58.11102061150875]
We show that language priors encoded in pre-trained language models allow us to infer fine-grained subgoal sequences.
In contrast to recent methods which make strong assumptions about subgoal supervision, our experiments show that language models can infer detailed subgoal sequences without any fine-tuning.
arXiv Detail & Related papers (2022-05-28T01:03:30Z) - TunBERT: Pretrained Contextualized Text Representation for Tunisian
Dialect [0.0]
We investigate the feasibility of training monolingual Transformer-based language models for under represented languages.
We show that the use of noisy web crawled data instead of structured data is more convenient for such non-standardized language.
Our best performing TunBERT model reaches or improves the state-of-the-art in all three downstream tasks.
arXiv Detail & Related papers (2021-11-25T15:49:50Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z) - Exemplar-Controllable Paraphrasing and Translation using Bitext [57.92051459102902]
We adapt models from prior work to be able to learn solely from bilingual text (bitext)
Our single proposed model can perform four tasks: controlled paraphrase generation in both languages and controlled machine translation in both language directions.
arXiv Detail & Related papers (2020-10-12T17:02:50Z) - The Adapter-Bot: All-In-One Controllable Conversational Model [66.48164003532484]
We propose a dialogue model that uses a fixed backbone model such as DialGPT and triggers on-demand dialogue skills via different adapters.
Depending on the skills, the model is able to process multiple knowledge types, such as text, tables, and emphatic responses.
We evaluate our model using automatic evaluation by comparing it with existing state-of-the-art conversational models.
arXiv Detail & Related papers (2020-08-28T10:59:31Z) - An Empirical Investigation of Pre-Trained Transformer Language Models
for Open-Domain Dialogue Generation [23.343006562849126]
We present an empirical investigation of pre-trained Transformer-based auto-regressive language models for the task of open-domain dialogue generation.
Training paradigm of pre-training and fine-tuning is employed to conduct learning.
Experiments are conducted on the typical single-turn and multi-turn dialogue corpora such as Weibo, Douban, Reddit, DailyDialog, and Persona-Chat.
arXiv Detail & Related papers (2020-03-09T15:20:21Z) - Emergent Communication with World Models [80.55287578801008]
We introduce Language World Models, a class of language-conditional generative model which interpret natural language messages.
We incorporate this "observation" into a persistent memory state, and allow the listening agent's policy to condition on it.
We show this improves effective communication and task success in 2D gridworld speaker-listener navigation tasks.
arXiv Detail & Related papers (2020-02-22T02:34:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.