Related papers: Steering LLMs toward Korean Local Speech: Iterative Refinement Framework for Faithful Dialect Translation

Steering LLMs toward Korean Local Speech: Iterative Refinement Framework for Faithful Dialect Translation

URL: http://arxiv.org/abs/2511.06680v1
Date: Mon, 10 Nov 2025 03:52:24 GMT
Title: Steering LLMs toward Korean Local Speech: Iterative Refinement Framework for Faithful Dialect Translation
Authors: Keunhyeung Park, Seunguk Yu, Youngbin Kim,
Abstract summary: DIA-REFINE is a framework for goal-directed, inclusive dialect translation.<n>We introduce the dialect fidelity score (DFS) to quantify linguistic shift and the target dialect ratio (TDR) to measure the success of dialect translation.<n>Our work establishes a robust framework for goal-directed, inclusive dialect translation.
Score: 17.99472063920348
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Standard-to-dialect machine translation remains challenging due to a persistent dialect gap in large language models and evaluation distortions inherent in n-gram metrics, which favor source copying over authentic dialect translation. In this paper, we propose the dialect refinement (DIA-REFINE) framework, which guides LLMs toward faithful target dialect outputs through an iterative loop of translation, verification, and feedback using external dialect classifiers. To address the limitations of n-gram-based metrics, we introduce the dialect fidelity score (DFS) to quantify linguistic shift and the target dialect ratio (TDR) to measure the success of dialect translation. Experiments on Korean dialects across zero-shot and in-context learning baselines demonstrate that DIA-REFINE consistently enhances dialect fidelity. The proposed metrics distinguish between False Success cases, where high n-gram scores obscure failures in dialectal translation, and True Attempt cases, where genuine attempts at dialectal translation yield low n-gram scores. We also observed that models exhibit varying degrees of responsiveness to the framework, and that integrating in-context examples further improves the translation of dialectal expressions. Our work establishes a robust framework for goal-directed, inclusive dialect translation, providing both rigorous evaluation and critical insights into model performance.

Related papers

Languages in Multilingual Speech Foundation Models Align Both Phonetically and Semantically [58.019484208091534]
Cross-lingual alignment in pretrained language models (LMs) has enabled efficient transfer in text-based LMs.<n>It remains an open question whether findings and methods from text-based cross-lingual alignment apply to speech.
arXiv Detail & Related papers (2025-05-26T07:21:20Z)
LLM-Based Evaluation of Low-Resource Machine Translation: A Reference-less Dialect Guided Approach with a Refined Sylheti-English Benchmark [1.3927943269211591]
We propose a comprehensive framework that enhances Large Language Models (LLMs)-based machine translation evaluation.<n>We extend the ONUBAD dataset by incorporating Sylheti-English sentence pairs, corresponding machine translations, and Direct Assessment (DA) scores annotated by native speakers.<n>Our evaluation shows that the proposed pipeline consistently outperforms existing methods, achieving the highest gain of +0.1083 in Spearman correlation.
arXiv Detail & Related papers (2025-05-18T07:24:13Z)
When LLMs Struggle: Reference-less Translation Evaluation for Low-resource Languages [9.138590152838754]
Segment-level quality estimation (QE) is a challenging cross-lingual language understanding task.<n>We comprehensively evaluate large language models (LLMs) in zero/few-shot scenarios.<n>Our results indicate that prompt-based approaches are outperformed by the encoder-based fine-tuned QE models.
arXiv Detail & Related papers (2025-01-08T12:54:05Z)
Assessing Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks [68.33068005789116]
We introduce ReDial, a benchmark containing 1.2K+ parallel query pairs in Standardized English and AAVE.<n>We evaluate widely used models, including GPT, Claude, Llama, Mistral, and the Phi model families.<n>Our work establishes a systematic and objective framework for analyzing LLM bias in dialectal queries.
arXiv Detail & Related papers (2024-10-14T18:44:23Z)
Modeling Orthographic Variation in Occitan's Dialects [3.038642416291856]
Large multilingual models minimize the need for spelling normalization during pre-processing. Our findings suggest that large multilingual models minimize the need for spelling normalization during pre-processing.
arXiv Detail & Related papers (2024-04-30T07:33:51Z)
Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks. Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena. For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z)
Dialect-robust Evaluation of Generated Text [40.85375247260744]
We formalize dialect robustness and dialect awareness as goals for NLG evaluation metrics. Applying the suite to current state-of-the-art metrics, we demonstrate that they are not dialect-robust.
arXiv Detail & Related papers (2022-11-02T07:12:23Z)
Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes. With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech. We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z)
When Does Translation Require Context? A Data-driven, Multilingual Exploration [71.43817945875433]
proper handling of discourse significantly contributes to the quality of machine translation (MT) Recent works in context-aware MT attempt to target a small set of discourse phenomena during evaluation. We develop the Multilingual Discourse-Aware benchmark, a series of taggers that identify and evaluate model performance on discourse phenomena.
arXiv Detail & Related papers (2021-09-15T17:29:30Z)
Improving Multilingual Translation by Representation and Gradient Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level. Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.