Fugu-MT 論文翻訳(概要): LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence

論文の概要: LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence

arxiv url: http://arxiv.org/abs/2509.03505v1
Date: Wed, 03 Sep 2025 17:39:08 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-04 21:40:46.616801
Title: LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence
Title（参考訳）: LimiX: 汎用インテリジェンスのための構造化データモデリング能力の開放
Authors: Xingxuan Zhang, Gang Ren, Han Yu, Hao Yuan, Hui Wang, Jiansheng Li, Jiayun Wu, Lang Mo, Li Mao, Mingchao Hao, Ningbo Dai, Renzhe Xu, Shuyang Li, Tianyang Zhang, Yue He, Yuanrui Wang, Yunjia Zhang, Zijing Xu, Dongzhe Li, Fang Gao, Hao Zou, Jiandong Liu, Jiashuo Liu, Jiawei Xu, Kaijie Cheng, Kehan Li, Linjun Zhou, Qing Li, Shaohua Fan, Xiaoyu Lin, Xinyan Han, Xuanyue Li, Yan Lu, Yuan Xue, Yuanyuan Jiang, Zimu Wang, Zhenlei Wang, Peng Cui,
Abstract要約: LimiXは構造化データを変数と欠落に対する共同分布として扱う。サンプルサイズ,特徴次元,クラス数,カテゴリ間特徴比,欠落度,サンプル間特徴比の幅の広い10種類の大規模構造化データベンチマークを対象に,LimiXの評価を行った。
参考スコア（独自算出の注目度）: 61.46575527504109
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We argue that progress toward general intelligence requires complementary foundation models grounded in language, the physical world, and structured data. This report presents LimiX, the first installment of our large structured-data models (LDMs). LimiX treats structured data as a joint distribution over variables and missingness, thus capable of addressing a wide range of tabular tasks through query-based conditional prediction via a single model. LimiX is pretrained using masked joint-distribution modeling with an episodic, context-conditional objective, where the model predicts for query subsets conditioned on dataset-specific contexts, supporting rapid, training-free adaptation at inference. We evaluate LimiX across 10 large structured-data benchmarks with broad regimes of sample size, feature dimensionality, class number, categorical-to-numerical feature ratio, missingness, and sample-to-feature ratios. With a single model and a unified interface, LimiX consistently surpasses strong baselines including gradient-boosting trees, deep tabular networks, recent tabular foundation models, and automated ensembles, as shown in Figure 1 and Figure 2. The superiority holds across a wide range of tasks, such as classification, regression, missing value imputation, and data generation, often by substantial margins, while avoiding task-specific architectures or bespoke training per task. All LimiX models are publicly accessible under Apache 2.0.
Abstract（参考訳）: 汎用インテリジェンスへの進歩には,言語,物理世界,構造化データに基づく補完的基礎モデルが必要である,と我々は主張する。本報告では,我々の大規模構造化データモデル(LDM)の最初のインストールであるLimiXについて述べる。 LimiXは、構造化データを変数と欠落に対する共同分布として扱い、単一のモデルによるクエリベースの条件付き予測を通じて、広範囲の表計算タスクに対処することができる。 LimiXは、データセット固有のコンテキストで条件付けされたクエリサブセットを予測し、推論における高速でトレーニング不要な適応をサポートするという、エピソードなコンテキスト条件の目的を備えたマスク付き共同配信モデルを用いて事前訓練される。サンプルサイズ,特徴次元,クラス数,カテゴリ間特徴比,欠落度,サンプル間特徴比の幅の広い10種類の大規模構造化データベンチマークを対象に,LimiXの評価を行った。単一モデルと統一インターフェースにより、図1と図2に示すように、LimiXは、勾配ブースティングツリー、深い表層ネットワーク、最近の表層基盤モデル、自動アンサンブルなど、一貫して強力なベースラインを越えている。優越性は、分類、回帰、値計算の欠如、データ生成など、幅広いタスクにまたがっており、タスク固有のアーキテクチャやタスク毎のトレーニングを避けている。すべてのLimiXモデルはApache 2.0で公開されている。

論文の概要: LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence

関連論文リスト