Fugu-MT 論文翻訳(概要): LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence

論文の概要: LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence

arxiv url: http://arxiv.org/abs/2509.03505v2
Date: Fri, 07 Nov 2025 16:49:47 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-10 16:56:01.059353
Title: LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence
Title（参考訳）: LimiX: 汎用インテリジェンスのための構造化データモデリング能力の開放
Authors: Xingxuan Zhang, Gang Ren, Han Yu, Hao Yuan, Hui Wang, Jiansheng Li, Jiayun Wu, Lang Mo, Li Mao, Mingchao Hao, Ningbo Dai, Renzhe Xu, Shuyang Li, Tianyang Zhang, Yue He, Yuanrui Wang, Yunjia Zhang, Zijing Xu, Dongzhe Li, Fang Gao, Hao Zou, Jiandong Liu, Jiashuo Liu, Jiawei Xu, Kaijie Cheng, Kehan Li, Linjun Zhou, Qing Li, Shaohua Fan, Xiaoyu Lin, Xinyan Han, Xuanyue Li, Yan Lu, Yuan Xue, Yuanyuan Jiang, Zimu Wang, Zhenlei Wang, Peng Cui,
Abstract要約: LimiX-16MとLimiX-2Mは、構造化されたデータを変数と欠落に対する共同分布として扱う。サンプルサイズ,特徴次元,クラス数,カテゴリ間特徴比,欠落度,サンプル-特徴比の広い11種類の大規模構造化データベンチマークを対象としたLimiXモデルの評価を行った。
参考スコア（独自算出の注目度）: 61.46575527504109
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We argue that progress toward general intelligence requires complementary foundation models grounded in language, the physical world, and structured data. This report presents LimiX-16M and LimiX-2M, two instantiations of our large structured-data models (LDMs). Both models treat structured data as a joint distribution over variables and missingness, thus capable of addressing a wide range of tabular tasks through query-based conditional prediction via a single model. They are pretrained using masked joint-distribution modeling with an episodic, context-conditional objective, supporting rapid, training-free adaptation at inference. We evaluate LimiX models across 11 large structured-data benchmarks with broad regimes of sample size, feature dimensionality, class number, categorical-to-numerical feature ratio, missingness, and sample-to-feature ratios. LimiX-16M consistently surpasses strong baselines, as shown in Figure 1 and Figure 2. The superiority holds across a wide range of tasks, such as classification, regression, missing value imputation, and data generation, often by substantial margins, while avoiding task-specific architectures or bespoke training per task. Notably, LimiX-2M delivers strong results under tight compute and memory budgets. We also present the first scaling law study for LDMs, revealing how data and model scaling jointly influence downstream performance and offering quantitative guidance for tabular foundation modeling. All LimiX models are publicly accessible under Apache 2.0.
Abstract（参考訳）: 汎用インテリジェンスへの進歩には,言語,物理世界,構造化データに基づく補完的基礎モデルが必要である,と我々は主張する。本稿では,LimiX-16MとLimiX-2Mについて述べる。どちらのモデルも、構造化データを変数と欠落に対する共同分布として扱い、単一のモデルによるクエリベースの条件付き予測を通じて、広範囲の表計算タスクに対処することができる。表層的・文脈的目的を持ったマスク付き共同分布モデルを用いて事前学習を行い、推論における高速で訓練不要な適応を支援する。サンプルサイズ,特徴次元,クラス数,カテゴリ間特徴比,欠落度,サンプル-特徴比の広い11種類の大規模構造化データベンチマークを対象としたLimiXモデルの評価を行った。 LimiX-16Mは、図1と図2に示すように、一貫して強いベースラインを超える。優越性は、分類、回帰、値計算の欠如、データ生成など、幅広いタスクにまたがっており、タスク固有のアーキテクチャやタスク毎のトレーニングを避けている。特に、LimiX-2Mは、厳密な計算とメモリ予算の下で強力な結果をもたらす。また, LDMにおける最初のスケーリング法則を提示し, データとモデルのスケーリングが下流のパフォーマンスにどのように影響するかを明らかにし, 表層基礎モデリングのための定量的ガイダンスを提供する。すべてのLimiXモデルはApache 2.0で公開されている。

論文の概要: LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence

関連論文リスト