DIALIGHT: Lightweight Multilingual Development and Evaluation of
Task-Oriented Dialogue Systems with Large Language Models
- URL: http://arxiv.org/abs/2401.02208v1
- Date: Thu, 4 Jan 2024 11:27:48 GMT
- Title: DIALIGHT: Lightweight Multilingual Development and Evaluation of
Task-Oriented Dialogue Systems with Large Language Models
- Authors: Songbo Hu, Xiaobin Wang, Zhangdie Yuan, Anna Korhonen, Ivan Vuli\'c
- Abstract summary: DIALIGHT is a toolkit for developing and evaluating multilingual Task-Oriented Dialogue (ToD) systems.
It features a secure, user-friendly web interface for fine-grained human evaluation at both local utterance level and global dialogue level.
Our evaluations reveal that while PLM fine-tuning leads to higher accuracy and coherence, LLM-based systems excel in producing diverse and likeable responses.
- Score: 76.79929883963275
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present DIALIGHT, a toolkit for developing and evaluating multilingual
Task-Oriented Dialogue (ToD) systems which facilitates systematic evaluations
and comparisons between ToD systems using fine-tuning of Pretrained Language
Models (PLMs) and those utilising the zero-shot and in-context learning
capabilities of Large Language Models (LLMs). In addition to automatic
evaluation, this toolkit features (i) a secure, user-friendly web interface for
fine-grained human evaluation at both local utterance level and global dialogue
level, and (ii) a microservice-based backend, improving efficiency and
scalability. Our evaluations reveal that while PLM fine-tuning leads to higher
accuracy and coherence, LLM-based systems excel in producing diverse and
likeable responses. However, we also identify significant challenges of LLMs in
adherence to task-specific instructions and generating outputs in multiple
languages, highlighting areas for future research. We hope this open-sourced
toolkit will serve as a valuable resource for researchers aiming to develop and
properly evaluate multilingual ToD systems and will lower, currently still
high, entry barriers in the field.
Related papers
- Balancing Accuracy and Efficiency in Multi-Turn Intent Classification for LLM-Powered Dialog Systems in Production [6.459396785817196]
This paper presents two novel approaches to enhance scalability and reduce latency in production dialogue systems.
First, we introduce Symbol Tuning, which simplifies intent labels to reduce task complexity and improve performance in multi-turn dialogues.
Second, we propose C-LARA, a framework that employs LLMs for data augmentation and pseudo-labeling to generate synthetic multi-turn dialogues.
arXiv Detail & Related papers (2024-11-19T07:48:35Z) - Lens: Rethinking Multilingual Enhancement for Large Language Models [70.85065197789639]
Lens is a novel approach to enhance multilingual capabilities of large language models (LLMs)
It operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs.
It achieves superior results with much fewer computational resources compared to existing post-training approaches.
arXiv Detail & Related papers (2024-10-06T08:51:30Z) - CUDRT: Benchmarking the Detection Models of Human vs. Large Language Models Generated Texts [9.682499180341273]
Large language models (LLMs) have greatly enhanced text generation across industries.
Their human-like outputs make distinguishing between human and AI authorship challenging.
Current benchmarks mainly rely on static datasets, limiting their effectiveness in assessing model-based detectors.
arXiv Detail & Related papers (2024-06-13T12:43:40Z) - Machine Translation with Large Language Models: Prompt Engineering for
Persian, English, and Russian Directions [0.0]
Generative large language models (LLMs) have demonstrated exceptional proficiency in various natural language processing (NLP) tasks.
We conducted an investigation into two popular prompting methods and their combination, focusing on cross-language combinations of Persian, English, and Russian.
arXiv Detail & Related papers (2024-01-16T15:16:34Z) - LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language
Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs)
Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z) - A Systematic Study of Performance Disparities in Multilingual
Task-Oriented Dialogue Systems [68.76102493999134]
We take stock of and empirically analyse task performance disparities that exist between multilingual task-oriented dialogue systems.
We prove the existence of the adaptation and intrinsic biases in current ToD systems.
Our analyses offer practical tips on how to approach ToD data collection and system development for new languages.
arXiv Detail & Related papers (2023-10-19T16:41:44Z) - L2CEval: Evaluating Language-to-Code Generation Capabilities of Large
Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs)
We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods.
In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z) - Overcoming Language Disparity in Online Content Classification with
Multimodal Learning [22.73281502531998]
Large language models are now the standard to develop state-of-the-art solutions for text detection and classification tasks.
The development of advanced computational techniques and resources is disproportionately focused on the English language.
We explore the promise of incorporating the information contained in images via multimodal machine learning.
arXiv Detail & Related papers (2022-05-19T17:56:02Z) - Reinforced Iterative Knowledge Distillation for Cross-Lingual Named
Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources.
Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages.
We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.