Fugu-MT 論文翻訳(概要): Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models

論文の概要: Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models

arxiv url: http://arxiv.org/abs/2603.12248v1
Date: Thu, 12 Mar 2026 17:57:50 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-13 14:46:26.287667
Title: Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models
Title（参考訳）: Tokensではなくマッチング機能: 言語モデルのエネルギーベースファインチューニング
Authors: Samy Jelassi, Mujin Kwun, Rosie Zhao, Yuanzhi Li, Nicolo Fusi, Yilun Du, Sham M. Kakade, Carles Domingo-Enrich,
Abstract要約: クロスエントロピー(CE)トレーニングは、言語モデルの密集したスケーラブルな監視を提供する。言語モデル微調整のための特徴マッチング手法を提案する。この目的を効率的に最適化するために,エネルギーベースファインチューニングを提案する。
参考スコア（独自算出の注目度）: 102.20309135516186
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cross-entropy (CE) training provides dense and scalable supervision for language models, but it optimizes next-token prediction under teacher forcing rather than sequence-level behavior under model rollouts. We introduce a feature-matching objective for language-model fine-tuning that targets sequence-level statistics of the completion distribution, providing dense semantic feedback without requiring a task-specific verifier or preference model. To optimize this objective efficiently, we propose energy-based fine-tuning (EBFT), which uses strided block-parallel sampling to generate multiple rollouts from nested prefixes concurrently, batches feature extraction over these rollouts, and uses the resulting embeddings to perform an on-policy policy-gradient update. We present a theoretical perspective connecting EBFT to KL-regularized feature-matching and energy-based modeling. Empirically, across Q&A coding, unstructured coding, and translation, EBFT matches RLVR and outperforms SFT on downstream accuracy while achieving a lower validation cross-entropy than both methods.
Abstract（参考訳）: クロスエントロピー(CE)トレーニングは、言語モデルの密集したスケーラブルな監視を提供するが、モデルロールアウト時のシーケンスレベルの振る舞いよりも、教師の強制下での次トーケン予測を最適化する。本稿では,言語モデルファインタニングのための特徴マッチングの目的について紹介する。これは完了分布のシーケンスレベル統計を目標とし,タスク固有の検証や嗜好モデルを必要としない,密集した意味フィードバックを提供する。この目的を効率的に最適化するために、ネストしたプレフィックスから複数のロールアウトを同時に生成するためにストリップされたブロック並列サンプリング(EBFT)を用い、これらのロールアウトに対して特徴抽出をバッチ化し、その結果の埋め込みを用いて、ポリシの段階的更新を行うエネルギーベースファインチューニング(EBFT)を提案する。本稿では,EBFT と KL-正規化特徴マッチングとエネルギーベースモデリングを結合する理論的視点を提案する。実証的には、Q&Aコーディング、非構造化コーディング、翻訳の間、EBFTはRLVRと一致し、下流の精度でSFTを上回り、どちらの方法よりも低い検証エントロピーを達成する。

論文の概要: Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models

関連論文リスト