Fugu-MT 論文翻訳(概要): Refining Word-Based Grammatical Error Annotation for L2 Korean

論文の概要: Refining Word-Based Grammatical Error Annotation for L2 Korean

arxiv url: http://arxiv.org/abs/2605.30545v1
Date: Thu, 28 May 2026 20:27:36 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-01 20:56:50.220589
Title: Refining Word-Based Grammatical Error Annotation for L2 Korean
Title（参考訳）: L2韓国語における単語ベースの文法的誤りアノテーションの書き換え
Authors: Jungyeul Park, Kyungtae Lim, Wonjun Oh, Benjamin Nguyen, Zihao Huang, Mengyang Qiu, Jayoung Song,
Abstract要約: 韓国の文法的誤り訂正(K-GEC)では,単語に基づく評価と多くの学習者誤りのモーデムレベルの軌跡との間に構造的ミスマッチが生じる。本稿では,既存の資源の3つの接続問題に対処することにより,L2韓国語に対する単語ベースの文法的誤りアノテーションを改良する。
参考スコア（独自算出の注目度）: 10.887221248702879
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Korean grammatical error correction (K-GEC) presents a structural mismatch between word-based evaluation and the morpheme-level locus of many learner errors. Postpositions and verbal endings are bound to lexical hosts, but they encode grammatical relations that must be represented in correction and evaluation. This paper refines word-based grammatical error annotation for L2 Korean by addressing three connected problems in existing resources: surface target realization, Korean-specific edit annotation, and single-reference evaluation. We reconstruct target sentences from the National Institute of Korean Language (NIKL) L2 corpus under morphologically constrained realization rules and convert its morpheme-level annotations into word-level \texttt{m2} edits. We then define a Korean ERRANT-style annotation scheme that preserves the MRU core while distinguishing functional morpheme errors, spelling errors, word boundary errors, and word order errors. We also augment the KoLLA corpus with an additional reference correction, yielding a multi-reference evaluation setting for Korean GEC. Empirical validation shows that the refined NIKL targets yield lower perplexity, the converted \texttt{m2} files achieve higher agreement with source-target edit representations, and the refined resources improve KoBART-based correction under the same model setting. Multi-reference KoLLA evaluation further reduces the penalty imposed on valid corrections that diverge from a single reference, especially for neural and prompted GEC systems. These results show that Korean GEC evaluation depends not only on correction models, but also on reference data and edit annotations that reflect Korean morphology, spacing, and correction variability.
Abstract（参考訳）: 韓国の文法的誤り訂正(K-GEC)では,単語に基づく評価と多くの学習者誤りのモーデムレベルの軌跡との間に構造的ミスマッチが生じる。命題と動詞の終末は語彙ホストに拘束されるが、訂正と評価で表わさなければならない文法的関係を符号化する。本稿では,L2韓国語に対する単語ベースの文法的誤りアノテーションを,表面目標実現,韓国語固有の編集アノテーション,単一参照評価という,既存の資源の3つの関連問題に対処することによって洗練する。我々は,形態的に制約された実現規則の下で,国立韓国語研究所 (NIKL) L2コーパスの目標文を再構築し,その形態素レベルのアノテーションを単語レベルの \texttt{m2} 編集に変換する。次に,機能的形態素誤り,スペル誤り,単語境界誤り,単語順序誤りを区別しながら,MRUコアを保存する韓国のERRANTスタイルのアノテーションスキームを定義する。また,KoLLAコーパスを追加参照補正により拡張し,韓国GECのマルチ参照評価設定を得た。実験的な検証では、改良されたNIKLターゲットは低いパープレキシティを達成し、変換された \texttt{m2} ファイルはソースターゲットの編集表現とのより高い一致を実現し、改良されたリソースは同じモデル設定でKoBARTベースの補正を改善する。マルチリファレンスのKoLLA評価により、単一参照から分岐する有効な修正に課されるペナルティ、特にニューラルおよび引き起こされたGECシステムに対するペナルティはさらに減少する。これらの結果から, 韓国のGEC評価は, 補正モデルだけでなく, 韓国の形態, 間隔, 修正の多様性を反映した参照データや編集アノテーションにも依存していることがわかった。

論文の概要: Refining Word-Based Grammatical Error Annotation for L2 Korean

関連論文リスト