Fugu-MT 論文翻訳(概要): Still Not There: Can LLMs Outperform Smaller Task-Specific Seq2Seq Models on the Poetry-to-Prose Conversion Task?

論文の概要: Still Not There: Can LLMs Outperform Smaller Task-Specific Seq2Seq Models on the Poetry-to-Prose Conversion Task?

arxiv url: http://arxiv.org/abs/2511.08145v1
Date: Wed, 12 Nov 2025 01:42:39 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-12 20:17:03.665889
Title: Still Not There: Can LLMs Outperform Smaller Task-Specific Seq2Seq Models on the Poetry-to-Prose Conversion Task?
Title（参考訳）: いまだに存在しない: LLM はより小型のタスク特化Seq2Seq モデルを Poetry-to-Prose 変換タスクで実行できるか?
Authors: Kunal Kingkar Das, Manoj Balaji Jagadeeshan, Nallani Chakravartula Sahith, Jivnesh Sandhan, Pawan Goyal,
Abstract要約: 大規模言語モデル(LLM)は、NLPタスクにまたがる普遍的で汎用的なソリューションとして扱われることが多い。しかし、この仮定はサンスクリットのような低リソースで形態的にリッチな言語に対して成り立つだろうか? 我々は,サンスクリットの詩文から散文への変換タスクにおいて,命令調整型およびテキスト内プロンプト型LLMと,タスク固有のエンコーダ・デコーダモデルとの比較を行った。
参考スコア（独自算出の注目度）: 4.048676271737789
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are increasingly treated as universal, general-purpose solutions across NLP tasks, particularly in English. But does this assumption hold for low-resource, morphologically rich languages such as Sanskrit? We address this question by comparing instruction-tuned and in-context-prompted LLMs with smaller task-specific encoder-decoder models on the Sanskrit poetry-to-prose conversion task. This task is intrinsically challenging: Sanskrit verse exhibits free word order combined with rigid metrical constraints, and its conversion to canonical prose (anvaya) requires multi-step reasoning involving compound segmentation, dependency resolution, and syntactic linearisation. This makes it an ideal testbed to evaluate whether LLMs can surpass specialised models. For LLMs, we apply instruction fine-tuning on general-purpose models and design in-context learning templates grounded in Paninian grammar and classical commentary heuristics. For task-specific modelling, we fully fine-tune a ByT5-Sanskrit Seq2Seq model. Our experiments show that domain-specific fine-tuning of ByT5-Sanskrit significantly outperforms all instruction-driven LLM approaches. Human evaluation strongly corroborates this result, with scores exhibiting high correlation with Kendall's Tau scores. Additionally, our prompting strategies provide an alternative to fine-tuning when domain-specific verse corpora are unavailable, and the task-specific Seq2Seq model demonstrates robust generalisation on out-of-domain evaluations.
Abstract（参考訳）: 大規模言語モデル(LLM)は、特に英語において、NLPタスク全体にわたって普遍的で汎用的なソリューションとして扱われることが多い。しかし、この仮定はサンスクリットのような低リソースで形態的にリッチな言語に対して成り立つだろうか? そこで本研究では,サンスクリットの詩文から散文への変換タスクにおいて,命令調整されたLLMとタスク固有のエンコーダ・デコーダ・モデルとの比較を行った。サンスクリット詩は厳密な計量的制約と組み合わされた自由語順を示し、その標準散文(アヴァヤ)への変換には、複合セグメンテーション、依存性分解、構文線形化を含む多段階の推論が必要である。これにより、LLMが特殊化モデルを超えることができるかどうかを評価するのに理想的なテストベッドとなる。 LLMに対して、汎用モデルに命令微調整を適用し、パニアン文法と古典的注釈ヒューリスティックスに基づくコンテキスト内学習テンプレートを設計する。タスク固有のモデリングでは、ByT5-Sanskrit Seq2Seqモデルを完全に微調整する。実験の結果、ByT5-Sanskritのドメイン固有の微調整は、命令駆動LLMのアプローチよりも大幅に優れていることがわかった。人間の評価は、ケンドールのタウスコアと高い相関を示すスコアとともに、この結果を強く裏付ける。さらに,提案手法は,ドメイン固有逆コーパスが利用できない場合の微調整の代替手段となり,タスク固有Seq2Seqモデルはドメイン外評価に頑健な一般化を示す。

論文の概要: Still Not There: Can LLMs Outperform Smaller Task-Specific Seq2Seq Models on the Poetry-to-Prose Conversion Task?

関連論文リスト