Predicting the Target Word of Game-playing Conversations using a Low-Rank Dialect Adapter for Decoder Models
- URL: http://arxiv.org/abs/2409.00358v1
- Date: Sat, 31 Aug 2024 05:53:39 GMT
- Title: Predicting the Target Word of Game-playing Conversations using a Low-Rank Dialect Adapter for Decoder Models
- Authors: Dipankar Srirag, Aditya Joshi, Jacob Eisenstein,
- Abstract summary: We extend the idea of dialect adapters to decoder models in our architecture called LoRDD.
LoRDD combines task adapters and dialect adapters where the latter employ contrastive learning on pseudo-parallel conversations from MD-3.
Our results for en-IN conversations on two models (Mistral and Gemma) show that LoRDD outperforms four baselines on TWP, while bridging the performance gap with en-US by 12% on word similarity and 25% on accuracy.
- Score: 16.289326589414404
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Dialect adapters that improve the performance of LLMs for NLU tasks on certain sociolects/dialects/national varieties ('dialects' for the sake of brevity) have been reported for encoder models. In this paper, we extend the idea of dialect adapters to decoder models in our architecture called LoRDD. Using MD-3, a publicly available dataset of word game-playing conversations between dialectal speakers, our task is Target Word Prediction (TWP) from a masked conversation. LoRDD combines task adapters and dialect adapters where the latter employ contrastive learning on pseudo-parallel conversations from MD-3. Our results for en-IN conversations on two models (Mistral and Gemma) show that LoRDD outperforms four baselines on TWP, while bridging the performance gap with en-US by 12% on word similarity and 25% on accuracy. The focused contribution of LoRDD is in its promise for dialect adaptation of decoder models.
Related papers
- Add Noise, Tasks, or Layers? MaiNLP at the VarDial 2025 Shared Task on Norwegian Dialectal Slot and Intent Detection [22.89563355840371]
Slot and intent detection is a classic natural language understanding task.
Many approaches for low-resource scenarios have not yet been applied to dialectal SID data.
We participate in the VarDial 2025 shared task on slot and intent detection in Norwegian varieties.
arXiv Detail & Related papers (2025-01-07T15:36:35Z) - NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts [57.53692236201343]
We propose a Multi-Task Correction MoE, where we train the experts to become an expert'' of speech-to-text, language-to-text and vision-to-text datasets.
NeKo performs competitively on grammar and post-OCR correction as a multi-task model.
arXiv Detail & Related papers (2024-11-08T20:11:24Z) - An Unsupervised Dialogue Topic Segmentation Model Based on Utterance Rewriting [3.5399864027190366]
This study proposes a novel unsupervised dialog topic segmentation method that combines the Utterance Rewriting (UR) technique with an unsupervised learning algorithm.
The proposed Discourse Rewriting Topic Model (UR-DTS) significantly improves the accuracy of topic segmentation.
arXiv Detail & Related papers (2024-09-12T00:27:31Z) - Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System [73.34663391495616]
We propose a pioneering approach to tackle joint multi-talker and target-talker speech recognition tasks.
Specifically, we freeze Whisper and plug a Sidecar separator into its encoder to separate mixed embedding for multiple talkers.
We deliver acceptable zero-shot performance on multi-talker ASR on AishellMix Mandarin dataset.
arXiv Detail & Related papers (2024-07-13T09:28:24Z) - MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset [14.026893125215912]
We introduce a novel task to generate 3D talking heads from speeches of diverse languages.
We collect a new multilingual 2D video dataset comprising over 420 hours of talking videos in 20 languages.
We present a metric for assessing lip-sync accuracy in multilingual settings.
arXiv Detail & Related papers (2024-06-20T12:52:46Z) - DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding [51.32965203977845]
We propose the use of discrete speech units (DSU) instead of continuous-valued speech encoder outputs.
The proposed model shows robust performance on speech inputs from seen/unseen domains and instruction-following capability in spoken question answering.
Our findings suggest that the ASR task and datasets are not crucial in instruction-tuning for spoken question answering tasks.
arXiv Detail & Related papers (2024-06-13T17:28:13Z) - Evaluating Dialect Robustness of Language Models via Conversation Understanding [2.8514881296685113]
We use English language (US English or Indian English) conversations between humans who play the word-guessing game of 'taboo'
We formulate two evaluative tasks: target word prediction (TWP) ($textiti.e.$, predict the masked target word in a conversation) and target word selection (TWS) ($textiti.e.$, select the most likely masked target word in a conversation)
We create two subsets: en-MV (where en-US is transformed to include dialectal information) and en-TR (where dialectal information is
arXiv Detail & Related papers (2024-05-09T11:38:23Z) - Are LLMs Robust for Spoken Dialogues? [10.855403629160921]
Large Pre-Trained Language Models have demonstrated state-of-the-art performance in different downstream tasks.
Most of the publicly available datasets and benchmarks on task-oriented dialogues focus on written conversations.
We have evaluated the performance of LLMs for spoken task-oriented dialogues on the DSTC11 test sets.
arXiv Detail & Related papers (2024-01-04T14:36:38Z) - APoLLo: Unified Adapter and Prompt Learning for Vision Language Models [58.9772868980283]
We present APoLLo, a unified multi-modal approach that combines Adapter and Prompt learning for Vision-Language models.
APoLLo achieves a relative gain up to 6.03% over MaPLe (SOTA) on novel classes for 10 diverse image recognition datasets.
arXiv Detail & Related papers (2023-12-04T01:42:09Z) - Multilingual self-supervised speech representations improve the speech
recognition of low-resource African languages with codeswitching [65.74653592668743]
Finetuning self-supervised multilingual representations reduces absolute word error rates by up to 20%.
In circumstances with limited training data finetuning self-supervised representations is a better performing and viable solution.
arXiv Detail & Related papers (2023-11-25T17:05:21Z) - LaDA: Latent Dialogue Action For Zero-shot Cross-lingual Neural Network
Language Modeling [20.002861239367704]
Cross-lingual adaptation has proven effective in spoken language understanding systems with limited resources.
Existing methods are frequently unsatisfactory for intent detection and slot filling.
Latent Dialogue Action layer is proposed to optimize decoding strategy.
arXiv Detail & Related papers (2023-08-05T15:51:45Z) - SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented
Dialogue Agents [72.42049370297849]
SpokenWOZ is a large-scale speech-text dataset for spoken TOD.
Cross-turn slot and reasoning slot detection are new challenges for SpokenWOZ.
arXiv Detail & Related papers (2023-05-22T13:47:51Z) - Efficiently Aligned Cross-Lingual Transfer Learning for Conversational
Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks.
We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset.
To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z) - Adapting Task-Oriented Dialogue Models for Email Conversations [4.45709593827781]
In this paper, we provide an effective transfer learning framework (EMToD) that allows the latest development in dialogue models to be adapted for long-form conversations.
We show that the proposed EMToD framework improves intent detection performance over pre-trained language models by 45% and over pre-trained dialogue models by 30% for task-oriented email conversations.
arXiv Detail & Related papers (2022-08-19T16:41:34Z) - Multi2WOZ: A Robust Multilingual Dataset and Conversational Pretraining
for Task-Oriented Dialog [67.20796950016735]
Multi2WOZ dataset spans four typologically diverse languages: Chinese, German, Arabic, and Russian.
We introduce a new framework for multilingual conversational specialization of pretrained language models (PrLMs) that aims to facilitate cross-lingual transfer for arbitrary downstream TOD tasks.
Our experiments show that, in most setups, the best performance entails the combination of (I) conversational specialization in the target language and (ii) few-shot transfer for the concrete TOD task.
arXiv Detail & Related papers (2022-05-20T18:35:38Z) - TOD-DA: Towards Boosting the Robustness of Task-oriented Dialogue
Modeling on Spoken Conversations [24.245354500835465]
We propose a novel model-agnostic data augmentation paradigm to boost the robustness of task-oriented dialogue modeling on spoken conversations.
Our approach ranked first in both tasks of DSTC10 Track2, a benchmark for task-oriented dialogue modeling on spoken conversations.
arXiv Detail & Related papers (2021-12-23T10:04:25Z) - Unsupervised Transfer Learning in Multilingual Neural Machine
Translation with Cross-Lingual Word Embeddings [72.69253034282035]
We exploit a language independent multilingual sentence representation to easily generalize to a new language.
Blindly decoding from Portuguese using a basesystem containing several Romance languages we achieve scores of 36.4 BLEU for Portuguese-English and 12.8 BLEU for Russian-English.
We explore a more practical adaptation approach through non-iterative backtranslation, exploiting our model's ability to produce high quality translations.
arXiv Detail & Related papers (2021-03-11T14:22:08Z) - A Tailored Pre-Training Model for Task-Oriented Dialog Generation [60.05269529832447]
We propose a Pre-trained Role Alternating Language model (PRAL) for task-oriented conversational systems.
We introduce a task-oriented dialog pretraining dataset by cleaning 13 existing data sets.
The results show that PRAL performs better or on par with state-of-the-art methods.
arXiv Detail & Related papers (2020-04-24T09:25:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.