Fugu-MT 論文翻訳(概要): Improving large language models with concept-aware fine-tuning

論文の概要: Improving large language models with concept-aware fine-tuning

arxiv url: http://arxiv.org/abs/2506.07833v1
Date: Mon, 09 Jun 2025 14:55:00 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-10 16:33:11.009076
Title: Improving large language models with concept-aware fine-tuning
Title（参考訳）: 概念を考慮した微調整による大規模言語モデルの改良
Authors: Michael K. Chen, Xikun Zhang, Jiaxing Huang, Dacheng Tao,
Abstract要約: 概念認識ファインチューニング(CAFT)は,大規模言語モデル(LLM)のための新しいマルチトークン学習手法である CAFTは複数のトークンにまたがるシーケンスの学習を可能にし、より強力な概念認識学習を促進する。実験は、従来の次世代ファインタニング法と比較して大幅に改善された。
参考スコア（独自算出の注目度）: 55.59287380665864
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have become the cornerstone of modern AI. However, the existing paradigm of next-token prediction fundamentally limits their ability to form coherent, high-level concepts, making it a critical barrier to human-like understanding and reasoning. Take the phrase "ribonucleic acid" as an example: an LLM will first decompose it into tokens, i.e., artificial text fragments ("rib", "on", ...), then learn each token sequentially, rather than grasping the phrase as a unified, coherent semantic entity. This fragmented representation hinders deeper conceptual understanding and, ultimately, the development of truly intelligent systems. In response, we introduce Concept-Aware Fine-Tuning (CAFT), a novel multi-token training method that redefines how LLMs are fine-tuned. By enabling the learning of sequences that span multiple tokens, this method fosters stronger concept-aware learning. Our experiments demonstrate significant improvements compared to conventional next-token finetuning methods across diverse tasks, including traditional applications like text summarization and domain-specific ones like de novo protein design. Multi-token prediction was previously only possible in the prohibitively expensive pretraining phase; CAFT, to our knowledge, is the first to bring the multi-token setting to the post-training phase, thus effectively democratizing its benefits for the broader community of practitioners and researchers. Finally, the unexpected effectiveness of our proposed method suggests wider implications for the machine learning research community. All code and data are available at https://github.com/michaelchen-lab/caft-llm
Abstract（参考訳）: 大規模言語モデル(LLM)が現代のAIの基盤となっている。しかし、既存の次世代予測のパラダイムは、一貫性のある高レベルの概念を形成する能力を根本的に制限し、人間のような理解と推論にとって重要な障壁となる。 LLMはまずトークン、すなわち人工的なテキスト断片("rib", "on", ...")に分解し、そのフレーズを統一された一貫性のあるセマンティックエンティティとして把握するのではなく、それぞれのトークンを逐次学習する。この断片化された表現は、深い概念的理解を妨げ、究極的には真にインテリジェントなシステムの開発を妨げる。そこで本研究では,LLMの微調整方法を再定義する,新しいマルチトークン学習手法であるConcept-Aware Fine-Tuning(CAFT)を提案する。複数のトークンにまたがるシーケンスの学習を可能にすることで、この手法はより強力な概念認識学習を促進する。本実験は, テキスト要約やデノボタンパク質設計などのドメイン固有手法など, 様々なタスクにまたがる従来の次世代ファインタニング手法と比較して, 大幅な改善が見られた。 CAFTは、我々の知る限り、このマルチトークン設定をポストトレーニングフェーズに初めて導入し、実践者や研究者の幅広いコミュニティにとってのメリットを効果的に民主化しました。最後に,提案手法の予期せぬ有効性は,機械学習研究コミュニティに広範な影響を与えることを示唆している。すべてのコードとデータはhttps://github.com/michaelchen-lab/caft-llmで公開されている。

論文の概要: Improving large language models with concept-aware fine-tuning

関連論文リスト