Data Augmentation with Paraphrase Generation and Entity Extraction for
Multimodal Dialogue System
- URL: http://arxiv.org/abs/2205.04006v1
- Date: Mon, 9 May 2022 02:21:20 GMT
- Title: Data Augmentation with Paraphrase Generation and Entity Extraction for
Multimodal Dialogue System
- Authors: Eda Okur, Saurav Sahay, Lama Nachman
- Abstract summary: We are working towards a multimodal dialogue system for younger kids learning basic math concepts.
This work explores the potential benefits of data augmentation with paraphrase generation for the Natural Language Understanding module of the Spoken Dialogue Systems pipeline.
We have shown that paraphrasing with model-in-the-loop (MITL) strategies using small seed data is a promising approach yielding improved performance results for the Intent Recognition task.
- Score: 9.912419882236918
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Contextually aware intelligent agents are often required to understand the
users and their surroundings in real-time. Our goal is to build Artificial
Intelligence (AI) systems that can assist children in their learning process.
Within such complex frameworks, Spoken Dialogue Systems (SDS) are crucial
building blocks to handle efficient task-oriented communication with children
in game-based learning settings. We are working towards a multimodal dialogue
system for younger kids learning basic math concepts. Our focus is on improving
the Natural Language Understanding (NLU) module of the task-oriented SDS
pipeline with limited datasets. This work explores the potential benefits of
data augmentation with paraphrase generation for the NLU models trained on
small task-specific datasets. We also investigate the effects of extracting
entities for conceivably further data expansion. We have shown that
paraphrasing with model-in-the-loop (MITL) strategies using small seed data is
a promising approach yielding improved performance results for the Intent
Recognition task.
Related papers
- RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks.
Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs.
In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z) - Neural-Bayesian Program Learning for Few-shot Dialogue Intent Parsing [14.90367428035125]
We propose a novel Neural-Bayesian Learning model named Dialogue-Intentesian Program (DI-)
DI- specializes in intent parsing under data-hungry settings and offers promising performance improvements.
Experimental results demonstrate that DI- outperforms state-of-the-art deep learning models and offers practical advantages for industrial-scale applications.
arXiv Detail & Related papers (2024-10-08T16:54:00Z) - Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data [84.01401439030265]
Recent end-to-end speech language models (SLMs) have expanded upon the capabilities of large language models (LLMs)
We present a simple yet effective automatic process for creating speech-text pair data.
Our model demonstrates general capabilities for speech-related tasks without the need for speech instruction-tuning data.
arXiv Detail & Related papers (2024-09-30T07:01:21Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - Creating Spoken Dialog Systems in Ultra-Low Resourced Settings [0.0]
We build on existing light models for intent classification in Flemish.
We apply different augmentation techniques on two levels -- the voice level, and the phonetic transcripts level.
We find that our data augmentation techniques, on both levels, have improved the model performance on a number of tasks.
arXiv Detail & Related papers (2023-12-11T10:04:05Z) - LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language
Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs)
Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z) - Injecting linguistic knowledge into BERT for Dialogue State Tracking [60.42231674887294]
This paper proposes a method that extracts linguistic knowledge via an unsupervised framework.
We then utilize this knowledge to augment BERT's performance and interpretability in Dialogue State Tracking (DST) tasks.
We benchmark this framework on various DST tasks and observe a notable improvement in accuracy.
arXiv Detail & Related papers (2023-11-27T08:38:42Z) - Inspecting Spoken Language Understanding from Kids for Basic Math
Learning at Home [8.819665252533104]
This work explores Spoken Language Understanding (SLU) pipeline within a task-oriented dialogue system developed for Kid Space.
Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU) components evaluated on our home deployment data.
arXiv Detail & Related papers (2023-06-01T09:31:57Z) - NLU for Game-based Learning in Real: Initial Evaluations [9.912419882236918]
This study explores the potential benefits of a recently proposed transformer-based multi-task NLU architecture.
It mainly performs Intent Recognition on small-size domain-specific educational game datasets.
We have shown that compared to the more straightforward baseline approaches, Dual Intent and Entity Transformer (DIET) architecture is robust enough to handle real-world data.
arXiv Detail & Related papers (2022-05-27T03:48:32Z) - Generative Conversational Networks [67.13144697969501]
We propose a framework called Generative Conversational Networks, in which conversational agents learn to generate their own labelled training data.
We show an average improvement of 35% in intent detection and 21% in slot tagging over a baseline model trained from the seed data.
arXiv Detail & Related papers (2021-06-15T23:19:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.