Fugu-MT 論文翻訳(概要): LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures

論文の概要: LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures

arxiv url: http://arxiv.org/abs/2509.14252v2
Date: Tue, 07 Oct 2025 17:55:14 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-08 15:38:21.575233
Title: LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures
Title（参考訳）: LLM-JEPA: 予測アーキテクチャを組み込んだ大規模言語モデル
Authors: Hai Huang, Yann LeCun, Randall Balestriero,
Abstract要約: 大規模言語モデル(LLM)の事前訓練、微調整、評価は、入力空間の再構築と生成能力に依存している。しかし、例えば、JEPA(Joint Embedding Predictive Architectures)のような埋め込み空間トレーニングの目的は、入力空間の目標よりもはるかに優れている。
参考スコア（独自算出の注目度）: 50.494504099850325
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large Language Model (LLM) pretraining, finetuning, and evaluation rely on input-space reconstruction and generative capabilities. Yet, it has been observed in vision that embedding-space training objectives, e.g., with Joint Embedding Predictive Architectures (JEPAs), are far superior to their input-space counterpart. That mismatch in how training is achieved between language and vision opens up a natural question: {\em can language training methods learn a few tricks from the vision ones?} The lack of JEPA-style LLM is a testimony of the challenge in designing such objectives for language. In this work, we propose a first step in that direction where we develop LLM-JEPA, a JEPA based solution for LLMs applicable both to finetuning and pretraining. Thus far, LLM-JEPA is able to outperform the standard LLM training objectives by a significant margin across models, all while being robust to overfiting. Those findings are observed across numerous datasets (NL-RX, GSM8K, Spider, RottenTomatoes) and various models from the Llama3, OpenELM, Gemma2 and Olmo families. Code: https://github.com/rbalestr-lab/llm-jepa.
Abstract（参考訳）: 大規模言語モデル(LLM)の事前訓練、微調整、評価は、入力空間の再構築と生成能力に依存している。しかし、例えば、JEPA(Joint Embedding Predictive Architectures)との組込み空間トレーニングの目的は、入力空間の目標よりもはるかに優れている、というビジョンが観察されている。言語とビジョンの間のトレーニングがいかに達成されるかというミスマッチは、自然な疑問を開きます。 } JEPAスタイルのLLMが欠如していることは、言語のためのこのような目的を設計する上での課題の証明です。そこで本研究では,LLM-JEPA (JEPA-based solution for LLMs for LLMs) をファインタニングと事前学習に応用するための第一歩として,LLM-JEPAを開発した。これまでのところ、LLM-JEPAは標準のLLMトレーニング目標を、モデル間で大きな差で上回っている。これらの結果は、多くのデータセット(NL-RX, GSM8K, Spider, Rotten Tomatoes)と、Llama3, OpenELM, Gemma2, Olmoの様々なモデルで観察された。コード:https://github.com/rbalestr-lab/llm-jepa。

論文の概要: LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures

関連論文リスト