Fugu-MT 論文翻訳(概要): Dynamic Tokenization via Reinforcement Patching: End-to-end Training and Zero-shot Transfer

論文の概要: Dynamic Tokenization via Reinforcement Patching: End-to-end Training and Zero-shot Transfer

arxiv url: http://arxiv.org/abs/2603.26097v1
Date: Fri, 27 Mar 2026 06:08:30 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-30 21:49:48.369519
Title: Dynamic Tokenization via Reinforcement Patching: End-to-end Training and Zero-shot Transfer
Title（参考訳）: 強化パッチによる動的トークン化:エンド・ツー・エンドトレーニングとゼロ・ショット・トランスファー
Authors: Yulun Wu, Sravan Kumar Ankireddy, Samuel Sharpe, Nikita Seleznev, Dehao Yuan, Hyeji Kim, Nam H. Nguyen,
Abstract要約: ReinPatchは、シーケンスパッチポリシーと下流シーケンスバックボーンモデルを協調的に最適化するフレームワークである。パッチ境界配置をGRPG(Group Relative Policy Gradient)によって最適化された決定プロセスとして定式化することにより、ReinPatchは継続的緩和の必要性を回避できる。我々は、時系列予測データセット上でReinPatchを評価し、最先端のデータ駆動型パッチ戦略と比較して魅力的なパフォーマンスを示す。
参考スコア（独自算出の注目度）: 19.88776013507579
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Efficiently aggregating spatial or temporal horizons to acquire compact representations has become a unifying principle in modern deep learning models, yet learning data-adaptive representations for long-horizon sequence data, especially continuous sequences like time series, remains an open challenge. While fixed-size patching has improved scalability and performance, discovering variable-sized, data-driven patches end-to-end often forces models to rely on soft discretization, specific backbones, or heuristic rules. In this work, we propose Reinforcement Patching (ReinPatch), the first framework to jointly optimize a sequence patching policy and its downstream sequence backbone model using reinforcement learning. By formulating patch boundary placement as a discrete decision process optimized via Group Relative Policy Gradient (GRPG), ReinPatch bypasses the need for continuous relaxations and performs dynamic patching policy optimization in a natural manner. Moreover, our method allows strict enforcement of a desired compression rate, freeing the downstream backbone to scale efficiently, and naturally supports multi-level hierarchical modeling. We evaluate ReinPatch on time-series forecasting datasets, where it demonstrates compelling performance compared to state-of-the-art data-driven patching strategies. Furthermore, our detached design allows the patching module to be extracted as a standalone foundation patcher, providing the community with visual and empirical insights into the segmentation behaviors preferred by a purely performance-driven neural patching strategy.
Abstract（参考訳）: コンパクトな表現を得るための空間的あるいは時間的地平線を効果的に集約することは、現代のディープラーニングモデルでは統一原則となっているが、長い水平シーケンスデータ、特に時系列のような連続シーケンスに対するデータ適応表現を学習することは、未解決の課題である。固定サイズのパッチはスケーラビリティとパフォーマンスを改善しているが、可変サイズのデータ駆動パッチの発見は、しばしばモデルをソフトな離散化や特定のバックボーン、ヒューリスティックなルールに頼らざるを得ない。本研究では、強化学習を用いて、シーケンスパッチポリシーと下流シーケンスバックボーンモデルを協調的に最適化する最初のフレームワークであるReinforcement Patching(ReinPatch)を提案する。パッチ境界配置をGRPG(Group Relative Policy Gradient)によって最適化された離散決定プロセスとして定式化することにより、ReinPatchは継続的な緩和の必要性を回避し、自然な方法で動的パッチポリシー最適化を実行する。さらに,提案手法は,所望の圧縮速度を厳格に実施し,下流のバックボーンを効率的に拡張し,マルチレベルの階層モデリングを自然にサポートする。我々は、時系列予測データセット上でReinPatchを評価し、最先端のデータ駆動型パッチ戦略と比較して魅力的なパフォーマンスを示す。さらに、分離した設計により、パッチモジュールをスタンドアロンのファンデーションパッチパッカーとして抽出することができ、純粋にパフォーマンス駆動のニューラルパッチ戦略によって好まれるセグメンテーション動作に関する視覚的および経験的な洞察をコミュニティに提供する。

論文の概要: Dynamic Tokenization via Reinforcement Patching: End-to-end Training and Zero-shot Transfer

関連論文リスト