Fugu-MT 論文翻訳(概要): CDT: A Comprehensive Capability Framework for Large Language Models Across Cognition, Domain, and Task

論文の概要: CDT: A Comprehensive Capability Framework for Large Language Models Across Cognition, Domain, and Task

arxiv url: http://arxiv.org/abs/2509.24422v1
Date: Mon, 29 Sep 2025 08:10:29 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:19.845572
Title: CDT: A Comprehensive Capability Framework for Large Language Models Across Cognition, Domain, and Task
Title（参考訳）: CDT:認知、ドメイン、タスク全体にわたる大規模言語モデルのための総合的な機能フレームワーク
Authors: Haosi Mo, Xinyu Ma, Xuebo Liu, Derek F. Wong, Yu Li, Jie Liu, Min Zhang,
Abstract要約: LLM(Large Language Models)の最近の進歩は、その能力を大幅に強化している。既存のベンチマークでは、LLM能力を評価するための総合的なフレームワークが欠如しているため、孤立した能力に重点を置いていることが多い。本稿では,3次元にまたがるモデルの性能を包括的に測定するコグニション・ドメイン・タスク(CDT)フレームワークを提案する。
参考スコア（独自算出の注目度）: 49.27354010985993
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in Large Language Models (LLMs) have significantly enhanced their capabilities, highlighting the need for comprehensive evaluation frameworks that extend beyond task-specific benchmarks. However, existing benchmarks often focus on isolated abilities, lacking a holistic framework for assessing LLM capabilities. To address this gap, we propose the Cognition-Domain-Task (CDT) framework, which comprehensively measures a model's capabilities across three dimensions. We expand the scope of model capability definitions at the cognitive level by incorporating the Cattell-Horn-Carroll cognitive theory, refining the categorization of model capabilities. We apply CDT in two directions: dataset capability evaluation and data selection. Experiments show that our capability metrics correlate well with downstream performance and can support effective dataset analysis and construction. The experiments on data selection also show significant improvements in both general and specific benchmarks, achieving scores of 44.3 and 45.4, with an increase of 1.6 and 2.2 points over the baselines, respectively. These results validate the effectiveness and practicality of CDT. Source code and models are available at https://github.com/Alessa-mo/CDT.
Abstract（参考訳）: 大規模言語モデル(LLM)の最近の進歩は、タスク固有のベンチマークを超えて拡張された包括的な評価フレームワークの必要性を強調しながら、その機能を大幅に強化している。しかし、既存のベンチマークは孤立した能力に重点を置いており、LLM能力を評価するための包括的なフレームワークが欠如している。このギャップに対処するため、我々は3次元にわたるモデルの能力を包括的に測定するCognition-Domain-Task(CDT)フレームワークを提案する。モデル能力の定義範囲を、キャッテル・ホール・キャロル認知理論を取り入れ、モデル能力の分類を洗練することにより、認知レベルで拡大する。データセット能力評価とデータ選択の2つの方向にCDTを適用する。実験によると、我々の能力メトリクスは下流のパフォーマンスとよく相関しており、効果的なデータセット分析と構築をサポートすることができる。データ選択の実験では、一般的なベンチマークと特定のベンチマークの両方が大幅に改善され、スコアは44.3点と45.4点となり、それぞれ1.6点と2.2点になった。これらの結果はCDTの有効性と実用性を検証する。ソースコードとモデルはhttps://github.com/Alessa-mo/CDT.comで入手できる。

論文の概要: CDT: A Comprehensive Capability Framework for Large Language Models Across Cognition, Domain, and Task

関連論文リスト