Jurassic is (almost) All You Need: Few-Shot Meaning-to-Text Generation
for Open-Domain Dialogue
- URL: http://arxiv.org/abs/2110.08094v1
- Date: Fri, 15 Oct 2021 13:42:25 GMT
- Title: Jurassic is (almost) All You Need: Few-Shot Meaning-to-Text Generation
for Open-Domain Dialogue
- Authors: Lena Reed, Cecilia Li, Angela Ramirez, Liren Wu, and Marilyn Walker
(Natural Language and Dialogue Systems Lab, University of California, Santa
Cruz)
- Abstract summary: We utilize Athena's response generators to create training data for two new neural Meaning-to-Text RGs.
We conduct few-shot experiments, both within and cross-domain, with different tuning set sizes.
We show that with 10-shot tuning, Athena-Jurassic's performance is significantly better for coherence and semantic accuracy.
- Score: 0.576178320759792
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One challenge with open-domain dialogue systems is the need to produce
high-quality responses on any topic. We aim to improve the quality and coverage
of Athena, an Alexa Prize dialogue system. We utilize Athena's response
generators (RGs) to create training data for two new neural Meaning-to-Text
RGs, Athena-GPT-Neo and Athena-Jurassic, for the movies, music, TV, sports, and
video game domains. We conduct few-shot experiments, both within and
cross-domain, with different tuning set sizes (2, 3, 10), prompt formats, and
meaning representations (MRs) for sets of WikiData KG triples, and dialogue
acts with 14 possible attribute combinations. Our evaluation uses BLEURT and
human evaluation metrics, and shows that with 10-shot tuning, Athena-Jurassic's
performance is significantly better for coherence and semantic accuracy.
Experiments with 2-shot tuning on completely novel MRs results in a huge
performance drop for Athena-GPT-Neo, whose semantic accuracy falls to 0.41, and
whose untrue hallucination rate increases to 12%. Experiments with dialogue
acts for video games show that with 10-shot tuning, both models learn to
control dialogue acts, but Athena-Jurassic has significantly higher coherence,
and only 4% untrue hallucinations. Our results suggest that Athena-Jurassic can
reliably produce outputs of high-quality for live systems with real users. To
our knowledge, these are the first results demonstrating that few-shot tuning
on a massive language model can create NLGs that generalize to new domains, and
produce high-quality, semantically-controlled, conversational responses
directly from MRs and KG triples.
Related papers
- Building Multimodal AI Chatbots [2.1987180245567246]
This work aims to create a multimodal AI system that chats with humans and shares relevant photos.
It proposes two multimodal deep learning models: an image retriever that understands texts and a response generator that understands images.
The two models are trained and evaluated on PhotoChat, an open-domain dialogue dataset in which a photo is shared in each session.
arXiv Detail & Related papers (2023-04-21T16:43:54Z) - Let's Get Personal: Personal Questions Improve SocialBot Performance in
the Alexa Prize [0.0]
There has been an increased focus on creating conversational open-domain dialogue systems in the spoken dialogue community.
Unlike traditional dialogue systems, these conversational systems cannot assume any specific information need or domain restrictions.
We developed a robust open-domain conversational system, Athena, that real Amazon Echo users access and evaluate at scale.
arXiv Detail & Related papers (2023-03-09T00:10:29Z) - A Transformer-based Response Evaluator for Open-Domain Spoken
Conversation [1.0474108328884806]
We study response selection in the Athena system, an Alexa Prize SocialBot.
We compare several off-the-shelf response ranking methods for open-domain dialogue.
We find that Athena-RR with a Recall@1 of 70.79% outperforms Athena-Heuristic and all of the off-the-shelf rankers by a large margin.
arXiv Detail & Related papers (2023-02-09T03:38:07Z) - RHO ($\rho$): Reducing Hallucination in Open-domain Dialogues with
Knowledge Grounding [57.46495388734495]
This paper presents RHO ($rho$) utilizing the representations of linked entities and relation predicates from a knowledge graph (KG)
We propose (1) local knowledge grounding to combine textual embeddings with the corresponding KG embeddings; and (2) global knowledge grounding to equip RHO with multi-hop reasoning abilities via the attention mechanism.
arXiv Detail & Related papers (2022-12-03T10:36:34Z) - Reranking Overgenerated Responses for End-to-End Task-Oriented Dialogue
Systems [71.33737787564966]
End-to-end (E2E) task-oriented dialogue (ToD) systems are prone to fall into the so-called 'likelihood trap'
We propose a reranking method which aims to select high-quality items from the lists of responses initially overgenerated by the system.
Our methods improve a state-of-the-art E2E ToD system by 2.4 BLEU, 3.2 ROUGE, and 2.8 METEOR scores, achieving new peak results.
arXiv Detail & Related papers (2022-11-07T15:59:49Z) - Fully Automated End-to-End Fake Audio Detection [57.78459588263812]
This paper proposes a fully automated end-toend fake audio detection method.
We first use wav2vec pre-trained model to obtain a high-level representation of the speech.
For the network structure, we use a modified version of the differentiable architecture search (DARTS) named light-DARTS.
arXiv Detail & Related papers (2022-08-20T06:46:55Z) - DialogVED: A Pre-trained Latent Variable Encoder-Decoder Model for
Dialog Response Generation [80.45816053153722]
DialogVED introduces continuous latent variables into the enhanced encoder-decoder pre-training framework to increase the relevance and diversity of responses.
We conduct experiments on PersonaChat, DailyDialog, and DSTC7-AVSD benchmarks for response generation.
arXiv Detail & Related papers (2022-04-27T16:18:15Z) - CommonsenseQA 2.0: Exposing the Limits of AI through Gamification [126.85096257968414]
We construct benchmarks that test the abilities of modern natural language understanding models.
In this work, we propose gamification as a framework for data construction.
arXiv Detail & Related papers (2022-01-14T06:49:15Z) - MERLOT Reserve: Neural Script Knowledge through Vision and Language and
Sound [90.1857707251566]
We introduce MERLOT Reserve, a model that represents videos jointly over time.
We replace snippets of text and audio with a MASK token; the model learns by choosing the correct masked-out snippet.
Our objective learns faster than alternatives, and performs well at scale.
arXiv Detail & Related papers (2022-01-07T19:00:21Z) - Athena 2.0: Contextualized Dialogue Management for an Alexa Prize
SocialBot [3.4000625471791577]
Athena 2.0 is an Alexa Prize SocialBot that has been a finalist in the last two Alexa Prize Grand Challenges.
Here we describe Athena's system design and performance in the 20/21 competition.
arXiv Detail & Related papers (2021-11-03T20:54:20Z) - Modeling Performance in Open-Domain Dialogue with PARADISE [7.516971632888974]
We develop a PARADISE model for predicting the performance of Athena, a dialogue system that has participated in thousands of conversations with real users.
Our goal is to learn a general objective function that can be used to optimize the dialogue choices of any Alexa Prize system in real time.
arXiv Detail & Related papers (2021-10-21T14:17:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.