Related papers: KoDialogBench: Evaluating Conversational Understanding of Language Models with Korean Dialogue Benchmark

KoDialogBench: Evaluating Conversational Understanding of Language Models with Korean Dialogue Benchmark

URL: http://arxiv.org/abs/2402.17377v2
Date: Mon, 17 Jun 2024 05:12:56 GMT
Title: KoDialogBench: Evaluating Conversational Understanding of Language Models with Korean Dialogue Benchmark
Authors: Seongbo Jang, Seonghyeon Lee, Hwanjo Yu,
Abstract summary: We introduce KoDialogBench, a benchmark designed to assess language models' conversational capabilities in Korean. We collect Korean dialogues on daily topics from public sources, or translate dialogues from other languages. We then structure these conversations into diverse test datasets, spanning from dialogue comprehension to response selection tasks.
Score: 19.14739816385178
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: As language models are often deployed as chatbot assistants, it becomes a virtue for models to engage in conversations in a user's first language. While these models are trained on a wide range of languages, a comprehensive evaluation of their proficiency in low-resource languages such as Korean has been lacking. In this work, we introduce KoDialogBench, a benchmark designed to assess language models' conversational capabilities in Korean. To this end, we collect native Korean dialogues on daily topics from public sources, or translate dialogues from other languages. We then structure these conversations into diverse test datasets, spanning from dialogue comprehension to response selection tasks. Leveraging the proposed benchmark, we conduct extensive evaluations and analyses of various language models to measure a foundational understanding of Korean dialogues. Experimental results indicate that there exists significant room for improvement in models' conversation skills. Furthermore, our in-depth comparisons across different language models highlight the effectiveness of recent training techniques in enhancing conversational proficiency. We anticipate that KoDialogBench will promote the progress towards conversation-aware Korean language models.

Related papers

Towards a Japanese Full-duplex Spoken Dialogue System [8.984488716637655]
Full spoken dialogue systems have attracted significant attention recently.<n>In this paper we present first publicly available full-stage spoken dialogue model in Japanese.<n>Our model is trained through two-channel process: pre-training on a large-scale spoken dialogue data in Japanese, followed by fine-tuning on high-quality stereo spoken dialogue data.
arXiv Detail & Related papers (2025-06-03T15:16:50Z)
WavChat: A Survey of Spoken Dialogue Models [66.82775211793547]
Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o, have captured significant attention in the speech domain. These advanced spoken dialogue models not only comprehend audio, music, and other speech-related features, but also capture stylistic and timbral characteristics in speech. Despite the progress in spoken dialogue systems, there is a lack of comprehensive surveys that systematically organize and analyze these systems.
arXiv Detail & Related papers (2024-11-15T04:16:45Z)
PSYDIAL: Personality-based Synthetic Dialogue Generation using Large Language Models [4.283022729693451]
We present a novel end-to-end personality-based synthetic dialogue data generation pipeline, specifically designed to elicit responses from large language models via prompting. We introduce PSYDIAL, the first Korean dialogue dataset focused on personality-based dialogues, curated using our proposed pipeline. Experimental results indicate that while pre-trained models and those fine-tuned with a chit-chat dataset struggle to generate responses reflecting personality, models trained with PSYDIAL show significant improvements.
arXiv Detail & Related papers (2024-04-01T05:19:34Z)
Large Language Model based Situational Dialogues for Second Language Learning [7.450328495455734]
In second language learning, scenario-based conversation practice is important for language learners to achieve fluency in speaking. To bridge this gap, we propose situational dialogue models for students to engage in conversational practice. Our situational dialogue models are fine-tuned on large language models (LLMs), with the aim of combining the engaging nature of an open-ended conversation with the focused practice of scenario-based tasks.
arXiv Detail & Related papers (2024-03-29T06:43:55Z)
Neural Conversation Models and How to Rein Them in: A Survey of Failures and Fixes [17.489075240435348]
Recent conditional language models are able to continue any kind of text source in an often seemingly fluent way. From a linguistic perspective, contributing to a conversation is high. Recent approaches try to tame the underlying language models at various intervention points.
arXiv Detail & Related papers (2023-08-11T12:07:45Z)
Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks. Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena. For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z)
User Adaptive Language Learning Chatbots with a Curriculum [55.63893493019025]
We adapt lexically constrained decoding to a dialog system, which urges the dialog system to include curriculum-aligned words and phrases in its generated utterances. The evaluation result demonstrates that the dialog system with curriculum infusion improves students' understanding of target words and increases their interest in practicing English.
arXiv Detail & Related papers (2023-04-11T20:41:41Z)
Building a Personalized Dialogue System with Prompt-Tuning [5.942602139622984]
We build a dialogue system that responds based on a given character setting (persona) We propose an approach that uses prompt-tuning, which has low learning costs, on pre-trained large-scale language models.
arXiv Detail & Related papers (2022-06-11T02:21:11Z)
"How Robust r u?": Evaluating Task-Oriented Dialogue Systems on Spoken Conversations [87.95711406978157]
This work presents a new benchmark on spoken task-oriented conversations. We study multi-domain dialogue state tracking and knowledge-grounded dialogue modeling. Our data set enables speech-based benchmarking of task-oriented dialogue systems.
arXiv Detail & Related papers (2021-09-28T04:51:04Z)
TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogue [113.45485470103762]
In this work, we unify nine human-human and multi-turn task-oriented dialogue datasets for language modeling. To better model dialogue behavior during pre-training, we incorporate user and system tokens into the masked language modeling.
arXiv Detail & Related papers (2020-04-15T04:09:05Z)
XPersona: Evaluating Multilingual Personalized Chatbot [76.00426517401894]
We propose a multi-lingual extension of Persona-Chat, namely XPersona. Our dataset includes persona conversations in six different languages other than English for building and evaluating multilingual personalized agents.
arXiv Detail & Related papers (2020-03-17T07:52:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.