Fugu-MT 論文翻訳(概要): Assessing Dutch Syllabification Algorithms and Improving Accuracy by Combining Phonetic and Orthographic Information through Deep Learning

論文の概要: Assessing Dutch Syllabification Algorithms and Improving Accuracy by Combining Phonetic and Orthographic Information through Deep Learning

arxiv url: http://arxiv.org/abs/2605.28834v1
Date: Fri, 10 Apr 2026 13:58:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-15 07:09:36.549408
Title: Assessing Dutch Syllabification Algorithms and Improving Accuracy by Combining Phonetic and Orthographic Information through Deep Learning
Title（参考訳）: 深層学習による音韻情報とオルソグラフィ情報の組み合わせによるオランダ音節分類アルゴリズムの評価と精度向上
Authors: Gus Lathouwers, Wieke Harmsen, Catia Cucchiarini, Helmer Strik,
Abstract要約: 音節化(syllabification)とは、単語を音節に分割する作業である。オランダのシラビフィケーションのための異なるアルゴリズムが提案されているが、比較評価は行われていない。オランダの正書法シラビフィケーションのための近代的なディープラーニングベースのフレームワークは開発されていない。
参考スコア（独自算出の注目度）: 8.0557471355991
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Syllabification describes the task of dividing words into syllables. Due to many rules and exceptions, training an algorithm to perform syllabification with high accuracy remains a challenge. Throughout the last decades, different algorithms have been put forth for Dutch syllabification, yet a comprehensive comparative assessment has not been done. Additionally, deep learning has gained significant popularity within NLP in recent years, yet no modern deep-learning based framework has been developed for Dutch orthographic syllabification. Finally, phonetic and orthographic syllabification algorithms have been examined separately, but not in combination. The aim of the current research was twofold: (a) to examine the performance of existing Dutch syllabification algorithms, and (b) to investigate whether combining phonetic and orthographic information into a single model can increase syllabification performance. To compare the performance of algorithms, four algorithms (Brandt Corstius, Liang, Trogkanis-Elkan (CRF), and a newly conceived deep-learning model) were applied to three different datasets (dictionary words, loanwords, pseudowords). The algorithms show varying performance across datasets, with the data-driven algorithms outperforming a knowledge-based algorithm in all but one condition. The new deep-learning methods developed led to increased performance compared to the best found in the literature (99.65% word accuracy, a 0.14% improvement). An analysis of the words for which adding phonetic information improved syllabification performance indicates that these were words in which the orthographic ambiguity could be resolved by information on pronunciation. Future research could examine other areas where phonetic information can benefit orthographic processing. In addition, the newly developed deep learning frameworks can be applied to other languages than Dutch.
Abstract（参考訳）: 音節化(syllabification)とは、単語を音節に分割する作業である。多くの規則や例外のため、高い精度でシラビフィケーションを行うアルゴリズムを訓練することは依然として困難である。過去数十年にわたり、オランダのシラビフィケーションのための異なるアルゴリズムが検討されてきたが、包括的な比較評価は行われていない。さらに、近年、ディープラーニングはNLP内で大きな人気を集めているが、オランダの正書法シラビフィケーションのための近代的なディープラーニングベースのフレームワークは開発されていない。最後に、音声と正書法を別々に検討するが、組み合わせはしない。現在の研究の目的は2つある。 (a)既存のオランダ製シラビフィケーションアルゴリズムの性能を検査し、 (b) 音韻情報と正書法情報を一つのモデルに組み込むことで、音節化性能が向上するかどうかを検討すること。アルゴリズムの性能を比較するため、4つのアルゴリズム(Brandt Corstius, Liang, Trogkanis-Elkan, CRF)を3つの異なるデータセット(辞書語, ローン語, 擬似語)に適用した。アルゴリズムはデータセット間で様々なパフォーマンスを示し、データ駆動アルゴリズムは1つの条件を除いて知識ベースのアルゴリズムよりも優れています。新たなディープラーニング手法が開発され、文献でよく見られるもの(単語の精度99.65%、改善率0.14%)と比べて性能が向上した。音素情報の付加により音素化性能が向上した単語を解析したところ,これらは発音情報によって正書法的曖昧さを解消できる単語であることがわかった。将来の研究は、音声情報が正書法処理に役立つ他の分野を調べるかもしれない。さらに、新たに開発されたディープラーニングフレームワークは、オランダ語以外の言語にも適用することができる。

論文の概要: Assessing Dutch Syllabification Algorithms and Improving Accuracy by Combining Phonetic and Orthographic Information through Deep Learning

関連論文リスト