Contextual Semantic Parsing for Multilingual Task-Oriented Dialogues
- URL: http://arxiv.org/abs/2111.02574v1
- Date: Thu, 4 Nov 2021 01:08:14 GMT
- Title: Contextual Semantic Parsing for Multilingual Task-Oriented Dialogues
- Authors: Mehrad Moradshahi, Victoria Tsai, Giovanni Campagna, Monica S. Lam
- Abstract summary: Given a large-scale dialogue data set in one language, we can automatically produce an effective semantic for other languages using machine translation.
We propose automatic translation of dialogue datasets with alignment to ensure faithful translation of slot values.
We show that the succinct representation reduces the compounding effect of translation errors.
- Score: 7.8378818005171125
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robust state tracking for task-oriented dialogue systems currently remains
restricted to a few popular languages. This paper shows that given a
large-scale dialogue data set in one language, we can automatically produce an
effective semantic parser for other languages using machine translation. We
propose automatic translation of dialogue datasets with alignment to ensure
faithful translation of slot values and eliminate costly human supervision used
in previous benchmarks. We also propose a new contextual semantic parsing
model, which encodes the formal slots and values, and only the last agent and
user utterances. We show that the succinct representation reduces the
compounding effect of translation errors, without harming the accuracy in
practice.
We evaluate our approach on several dialogue state tracking benchmarks. On
RiSAWOZ, CrossWOZ, CrossWOZ-EN, and MultiWOZ-ZH datasets we improve the state
of the art by 11%, 17%, 20%, and 0.3% in joint goal accuracy. We present a
comprehensive error analysis for all three datasets showing erroneous
annotations can obscure judgments on the quality of the model.
Finally, we present RiSAWOZ English and German datasets, created using our
translation methodology. On these datasets, accuracy is within 11% of the
original showing that high-accuracy multilingual dialogue datasets are possible
without relying on expensive human annotations.
Related papers
- Zero and Few-Shot Localization of Task-Oriented Dialogue Agents with a
Distilled Representation [5.551814548069404]
We propose automatic methods that use ToD training data in a source language to build a high-quality functioning dialogue agent.
We show that our approach closes the accuracy gap between few-shot and existing full-shot methods for ToD agents.
arXiv Detail & Related papers (2023-02-18T21:30:36Z) - Weakly Supervised Data Augmentation Through Prompting for Dialogue
Understanding [103.94325597273316]
We present a novel approach that iterates on augmentation quality by applying weakly-supervised filters.
We evaluate our methods on the emotion and act classification tasks in DailyDialog and the intent classification task in Facebook Multilingual Task-Oriented Dialogue.
For DailyDialog specifically, using 10% of the ground truth data we outperform the current state-of-the-art model which uses 100% of the data.
arXiv Detail & Related papers (2022-10-25T17:01:30Z) - OneAligner: Zero-shot Cross-lingual Transfer with One Rich-Resource
Language Pair for Low-Resource Sentence Retrieval [91.76575626229824]
We present OneAligner, an alignment model specially designed for sentence retrieval tasks.
When trained with all language pairs of a large-scale parallel multilingual corpus (OPUS-100), this model achieves the state-of-the-art result.
We conclude through empirical results and analyses that the performance of the sentence alignment task depends mostly on the monolingual and parallel data size.
arXiv Detail & Related papers (2022-05-17T19:52:42Z) - Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation [70.81596088969378]
Cross-lingual Outline-based Dialogue dataset (termed COD) enables natural language understanding.
COD enables dialogue state tracking, and end-to-end dialogue modelling and evaluation in 4 diverse languages.
arXiv Detail & Related papers (2022-01-31T18:11:21Z) - Cross-lingual Intermediate Fine-tuning improves Dialogue State Tracking [84.50302759362698]
We enhance the transfer learning process by intermediate fine-tuning of pretrained multilingual models.
We use parallel and conversational movie subtitles datasets to design cross-lingual intermediate tasks.
We achieve impressive improvements (> 20% on goal accuracy) on the parallel MultiWoZ dataset and Multilingual WoZ dataset.
arXiv Detail & Related papers (2021-09-28T11:22:38Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Zero-Shot Cross-lingual Semantic Parsing [56.95036511882921]
We study cross-lingual semantic parsing as a zero-shot problem without parallel data for 7 test languages.
We propose a multi-task encoder-decoder model to transfer parsing knowledge to additional languages using only English-Logical form paired data.
Our system frames zero-shot parsing as a latent-space alignment problem and finds that pre-trained models can be improved to generate logical forms with minimal cross-lingual transfer penalty.
arXiv Detail & Related papers (2021-04-15T16:08:43Z) - Auto Correcting in the Process of Translation -- Multi-task Learning
Improves Dialogue Machine Translation [31.247920419523066]
We conduct a deep analysis of a dialogue corpus and summarize three major issues on dialogue translation.
We propose a joint learning method to identify omission and typo, and utilize context to translate dialogue utterances.
Our experiments show that the proposed method improves translation quality by 3.2 BLEU over the baselines.
arXiv Detail & Related papers (2021-03-30T09:12:47Z) - A Few-Shot Semantic Parser for Wizard-of-Oz Dialogues with the Precise
ThingTalk Representation [5.56536459714557]
Previous attempts to build effective semantics for Wizard-of-Oz (WOZ) conversations suffer from the difficulty in acquiring a high-quality, manually annotated training set.
This paper proposes a new dialogue representation and a sample-efficient methodology that can predict precise dialogue states in WOZ conversations.
arXiv Detail & Related papers (2020-09-16T22:52:46Z) - MTOP: A Comprehensive Multilingual Task-Oriented Semantic Parsing
Benchmark [31.91964553419665]
We present a new multilingual dataset, called MTOP, comprising of 100k annotated utterances in 6 languages across 11 domains.
We achieve an average improvement of +6.3 points on Slot F1 for the two existing multilingual datasets, over best results reported in their experiments.
We demonstrate strong zero-shot performance using pre-trained models combined with automatic translation and alignment, and a proposed distant supervision method to reduce the noise in slot label projection.
arXiv Detail & Related papers (2020-08-21T07:02:11Z) - MultiWOZ 2.2 : A Dialogue Dataset with Additional Annotation Corrections
and State Tracking Baselines [15.540213987132839]
This work introduces MultiWOZ 2.2, which is a yet another improved version of this dataset.
Firstly, we identify and fix dialogue state annotation errors across 17.3% of the utterances on top of MultiWOZ 2.1.
Secondly, we redefine the vocabularies of slots with a large number of possible values.
arXiv Detail & Related papers (2020-07-10T22:52:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.