Fugu-MT 論文翻訳(概要): System Report for CCL25-Eval Task 5: New Dataset and LoRA-Fine-Tuned Qwen2.5

論文の概要: System Report for CCL25-Eval Task 5: New Dataset and LoRA-Fine-Tuned Qwen2.5

arxiv url: http://arxiv.org/abs/2606.12392v1
Date: Wed, 10 Jun 2026 17:54:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-11 16:42:38.609906
Title: System Report for CCL25-Eval Task 5: New Dataset and LoRA-Fine-Tuned Qwen2.5
Title（参考訳）: CCL25-Eval Task 5 のシステム報告:新しいデータセットと LoRA-Fine-Tuned Qwen2.5
Authors: Haotao Xie,
Abstract要約: タスクを3つのサブタスク(用語解釈、意味解釈、感情推論)に分解する。複数のオープンソースデータセットをベースとして,古典中国語教育ペアデータセットを構築するために,データのクリーニングとアライメントを行う。次に、Low-Rank Adaptation (LoRA) を用いてQwen2.5-14Bモデルを微調整することで、ドメイン固有化LLM(PoetryQwen)を提案する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Recently, large language models (LLMs) have achieved promising progress in the fields of classical Chinese translation and the generation of classical poetry. However, domain-specific research on precise translation and affective-semantic understanding of classical poetry remains limited. The main challenge is that most studies treat the poetic appreciation task as a general-domain problem, neglecting the distinctive features of poetic appreciation, while high-quality and domain-specific datasets are extremely limited. To address this limitation, we decompose the task into three subtasks: term interpretation, semantic interpretation, and emotional inference. Based on multiple open-source datasets, we perform data cleansing and alignment to construct the Classical Chinese Poetry Instruction Pair Dataset (CCPoetry-49K), which comprises 49,404 high-quality instruction-response pairs explicitly optimized for this domain. We then propose a domain-specialized LLM, called PoetryQwen, by applying Low-Rank Adaptation (LoRA) to fine-tune the Qwen2.5-14B model. Experimental results on the CCL25-Eval Task 5 benchmark demonstrate that PoetryQwen achieves a score of 0.757, representing a 9.7% improvement over the Qwen2.5-14B-Instruct baseline (0.690). These findings clearly indicate that PoetryQwen significantly enhances performance in precise translation and emotional understanding of classical poetry. We present new dataset and methodological considerations intended to support the domain-specific optimization of LLMs.
Abstract（参考訳）: 近年,漢訳の分野や古典詩の世代において,大きな言語モデル (LLMs) が有望な進歩を遂げている。しかし、漢詩の正確な翻訳と情緒的意味理解に関するドメイン固有の研究は依然として限られている。主な課題は、多くの研究が詩的な鑑賞課題を一般的なドメイン問題として扱い、詩的な鑑賞の特徴を無視しているのに対し、高品質でドメイン固有のデータセットは非常に限られていることである。この制限に対処するために、タスクを項解釈、意味解釈、感情的推論の3つのサブタスクに分解する。複数のオープンソースデータセットに基づいて、このドメインに最適化された49,404の高品質な命令応答ペアからなる古典中国語詩の命令ペアデータセット(CCPoetry-49K)を構築するために、データのクリーニングとアライメントを行う。次に、Low-Rank Adaptation (LoRA) を用いてQwen2.5-14Bモデルを微調整することで、ドメイン固有化LLM(PoetryQwen)を提案する。 CCL25-Eval Task 5ベンチマークの実験結果によると、PoetryQwenのスコアは0.757で、Qwen2.5-14B-Instructベースライン(0.690)よりも9.7%向上している。これらの結果は,古典詩の正確な翻訳や情緒的理解において,PoetryQwenが性能を著しく向上させることを示している。 LLMのドメイン固有最適化を支援するための新しいデータセットと方法論的考察を提案する。

論文の概要: System Report for CCL25-Eval Task 5: New Dataset and LoRA-Fine-Tuned Qwen2.5

関連論文リスト