Fugu-MT 論文翻訳(概要): MinT: Managed Infrastructure for Training and Serving Millions of LLMs

論文の概要: MinT: Managed Infrastructure for Training and Serving Millions of LLMs

arxiv url: http://arxiv.org/abs/2605.13779v1
Date: Wed, 13 May 2026 16:59:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-14 23:30:28.194847
Title: MinT: Managed Infrastructure for Training and Serving Millions of LLMs
Title（参考訳）: MinT: 数百万のLLMのトレーニングと実行のための管理インフラストラクチャ
Authors: Mind Lab, :, Song Cao, Vic Cao, Andrew Chen, Kaijie Chen, Cleon Cheng, Steven Chiang, Kaixuan Fan, Hera Feng, Huan Feng, Arthur Fu, Jun Gao, Hongquan Gu, Aaron Guan, Nolan Ho, Mutian Hong, Hailee Hou, Peixuan Hua, Charles Huang, Miles Jiang, Nora Jiang, Yuyi Jiang, Qiuyu Jin, Fancy Kong, Andrew Lei, Kyrie Lei, Alexy Li, Lucian Li, Ray Li, Theo Li, Zhihui Li, Jiayi Lin, Kairus Liu, Kieran Liu, Logan Liu, Xiang Liu, Irvine Lu, Maeve Luo, Runze Lv, Pony Ma, Verity Niu, Anson Qiu, Vincent Wang, Rio Yang, Maxwell Yao, Carrie Ye, Regis Ye, Wenlin Ye, Josh Ying, Danney Zeng, Yuhan Zhan, Anya Zhang, Di Zhang, Ruijia Zhang, Sueky Zhang, Ya Zhang, Wei Zhao, Ada Zhou, Changhai Zhou, Yuhua Zhou, Xinyue Zhu, Murphy Zhuang,
Abstract要約: MindLab Toolkit(MinT)は、ローランド適応(LoRA)ポストトレーニングとオンラインサービスのためのマネージドインフラストラクチャシステムである。 MinTは、少数の高価なベースモデルデプロイメントに対して、多くのトレーニング済みポリシーが生成される設定をターゲットにしている。
参考スコア（独自算出の注目度）: 18.78941243766295
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present MindLab Toolkit (MinT), a managed infrastructure system for Low-Rank Adaptation (LoRA) post-training and online serving. MinT targets a setting where many trained policies are produced over a small number of expensive base-model deployments. Instead of materializing each policy as a merged full checkpoint, MinT keeps the base model resident and moves exported LoRA adapter revisions through rollout, update, export, evaluation, serving, and rollback, hiding distributed training, serving, scheduling, and data movement behind a service interface. MinT scales this path along three axes. Scale Up extends LoRA RL to frontier-scale dense and MoE architectures, including MLA and DSA attention paths, with training and serving validated beyond 1T total parameters. Scale Down moves only the exported LoRA adapter, which can be under 1% of base-model size in rank-1 settings; adapter-only handoff reduces the measured step by 18.3x on a 4B dense model and 2.85x on a 30B MoE, while concurrent multi-policy GRPO shortens wall time by 1.77x and 1.45x without raising peak memory. Scale Out separates durable policy addressability from CPU/GPU working sets: a tensor-parallel deployment supports 10^6-scale addressable catalogs (measured single-engine sweeps through 100K) and thousand-adapter active waves at cluster scale, with cold loading treated as scheduled service work and packed MoE LoRA tensors improving live engine loading by 8.5-8.7x. MinT thus manages million-scale LoRA policy catalogs while training and serving selected adapter revisions over shared 1T-class base models.
Abstract（参考訳）: 我々は、低ランク適応(LoRA)ポストトレーニングとオンラインサービスのための管理インフラであるMintLab Toolkit(MinT)を紹介する。 MinTは、少数の高価なベースモデルデプロイメントに対して、多くのトレーニング済みポリシーが生成される設定をターゲットにしている。それぞれのポリシーを統合された完全なチェックポイントとして実現するのではなく、MinTはベースモデルを常駐させ、ロールアウト、更新、エクスポート、評価、サービス、ロールバック、分散トレーニング、サービス、スケジューリング、そしてサービスインターフェースの背後にあるデータ移動を通じて、輸出されたLoRAアダプタのリビジョンを移動させる。 MinTはこの経路を3つの軸に沿って拡大する。 Scale UpはLoRA RLを、MLAとDSAのアテンションパスを含むフロンティアスケールの高密度およびMoEアーキテクチャに拡張し、トレーニングと1Tの合計パラメータを超えて機能する。 Scale Downは輸出されたLoRAアダプタのみを動かし、これはランク1設定のベースモデルサイズの1%以下で、アダプタのみのハンドオフにより、4Bの高密度モデルでは18.3倍、30BのMoEでは2.85倍、同時にマルチポリティクスのGRPOはピークメモリを上昇させることなく壁時間を1.77倍と1.45倍に短縮する。テンソルパラレルデプロイメントは10^6スケールのアドレス可能なカタログ(100Kで測定されたシングルエンジンスイープ)と数千アダプタのアクティブウェーブをクラスタスケールでサポートし、コールドローディングはスケジュールされたサービス作業として扱われ、MoE LoRAテンソルは8.5-8.7xのライブエンジンローディングを改善した。これによりMinTは、100万規模のLoRAポリシーカタログを管理し、共有された1Tクラスのベースモデルに対して、選択されたアダプタリビジョンをトレーニングし提供する。

論文の概要: MinT: Managed Infrastructure for Training and Serving Millions of LLMs

関連論文リスト