Fugu-MT 論文翻訳(概要): Chain-of-Thought Reasoning Improves Context-Aware Translation with Large Language Models

論文の概要: Chain-of-Thought Reasoning Improves Context-Aware Translation with Large Language Models

arxiv url: http://arxiv.org/abs/2510.18077v1
Date: Mon, 20 Oct 2025 20:14:46 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 03:08:12.553198
Title: Chain-of-Thought Reasoning Improves Context-Aware Translation with Large Language Models
Title（参考訳）: 思考の連鎖推論は大規模言語モデルを用いた文脈認識翻訳を改善する
Authors: Shabnam Ataee, Andrei Popescu-Belis,
Abstract要約: 本稿では,文間依存関係を含む文を翻訳する大規模言語モデルの能力を評価する。我々はDeepSeek-R1, GPT, Llama, Mistral, Phiの12 LLMを2つのタスクで評価した。最良のモデルは推論を生かし、最初のタスクで約90%の精度で到達し、COMETは2タスクで約92%のスコアを得る。
参考スコア（独自算出の注目度）: 2.4063592468412276
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper assesses the capacity of large language models (LLMs) to translate texts that include inter-sentential dependencies. We use the English-French DiscEvalMT benchmark (Bawden et al., 2018) with pairs of sentences containing translation challenges either for pronominal anaphora or for lexical cohesion. We evaluate 12 LLMs from the DeepSeek-R1, GPT, Llama, Mistral and Phi families on two tasks: (1) distinguishing a correct translation from a wrong but plausible one; (2) generating a correct translation. We compare prompts that encourage chain-of-thought reasoning with those that do not. The best models take advantage of reasoning and reach about 90% accuracy on the first task, and COMET scores of about 92% on the second task, with GPT-4, GPT-4o and Phi standing out. Moreover, we observe a "wise get wiser" effect: the improvements through reasoning are positively correlated with the scores of the models without reasoning.
Abstract（参考訳）: 本稿では,大規模言語モデル(LLM)の文間依存関係を含む翻訳能力を評価する。英語とフランス語のDis DiscEvalMT ベンチマーク (Bawden et al , 2018) を, 韻律的アナフォラや語彙的結束の翻訳課題を含む文のペアで用いた。我々は,DeepSeek-R1,GPT,Llama,Mistral,Phiの12のLSMを,(1)正しい翻訳と間違った翻訳とを区別すること,(2)正しい翻訳を生成すること,の2つのタスクで評価した。チェーンオブ思考の推論を促進するプロンプトと、そうでないプロンプトを比較する。最良のモデルは、第1のタスクで推論と約90%の精度で到達し、COMETのスコアは第2のタスクで約92%、GPT-4、GPT-4o、Phiが目立つ。推論による改善は、推論なしでモデルのスコアと正に相関する。

関連論文リスト

From Harm to Help: Turning Reasoning In-Context Demos into Assets for Reasoning LMs [58.02809208460186]
デモとしてDeepSeek-R1の高品質なトレースを使って、このパラドックスを再検討する。デモが最適であっても、より多くの例を加えることで、常に精度が低下することがわかった。デモを明示的で再利用可能な洞察に変換するシーケンシャルなテストタイム手順であるInsight-to-solve(I2S)を紹介します。
論文参考訳（メタデータ） (2025-09-27T08:59:31Z)
Please Translate Again: Two Simple Experiments on Whether Human-Like Reasoning Helps Translation [18.00698389204074]
性能向上が、Chain-of-Thought推論を通じて翻訳プロセスを明示的に分解することに起因するという明確な証拠は示さない。分解は翻訳行動に影響を及ぼすが、分解に対する忠実さは翻訳に肯定的かつ否定的な影響をもたらす。
論文参考訳（メタデータ） (2025-06-05T00:04:39Z)
The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
大規模言語モデルの英語と非英語のパフォーマンスのギャップを埋めるための質問アライメントフレームワークを提案する。実験結果から、さまざまな推論シナリオ、モデルファミリー、サイズにわたって、多言語のパフォーマンスを向上できることが示された。我々は、表現空間、生成された応答とデータスケールを分析し、質問翻訳訓練がLLM内の言語アライメントをどのように強化するかを明らかにする。
論文参考訳（メタデータ） (2024-05-02T14:49:50Z)
Prosody in Cascade and Direct Speech-to-Text Translation: a case study on Korean Wh-Phrases [79.07111754406841]
本研究は,韻律が重要な役割を果たす発話を明瞭にするための直接S2TTシステムの能力を評価するために,コントラスト評価を用いることを提案する。本結果は,カスケード翻訳モデルよりも直接翻訳システムの価値を明確に示すものである。
論文参考訳（メタデータ） (2024-02-01T14:46:35Z)
Question Translation Training for Better Multilingual Reasoning [108.10066378240879]
大規模言語モデルは推論タスクにおいて魅力的なパフォーマンスを示すが、英語以外の言語ではより悪いパフォーマンスを示す傾向がある。典型的な解決策は、命令データを興味のあるすべての言語に翻訳し、結果の多言語データをトレーニングすることである。本稿では,X- English parallel question dataを微調整することで,推論する質問を英語に翻訳するモデルを訓練する。
論文参考訳（メタデータ） (2024-01-15T16:39:10Z)
A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models [27.777372498182864]
生成型大規模言語モデル(LLM)のための新しい微調整手法を提案する。提案手法は,モノリンガルデータに対する初期微調整と,それに続く少数の高品質並列データに対する微調整の2段階からなる。 LLaMA-2を基礎モデルとして,このモデルではゼロショット性能よりも12BLEUおよび12COMETの平均的な改善が達成できることを示した。
論文参考訳（メタデータ） (2023-09-20T22:53:15Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。