Fugu-MT 論文翻訳(概要): Cognitive-YOLO: LLM-Driven Architecture Synthesis from First Principles of Data for Object Detection

論文の概要: Cognitive-YOLO: LLM-Driven Architecture Synthesis from First Principles of Data for Object Detection

arxiv url: http://arxiv.org/abs/2512.12281v1
Date: Sat, 13 Dec 2025 10:52:54 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-16 17:54:56.197737
Title: Cognitive-YOLO: LLM-Driven Architecture Synthesis from First Principles of Data for Object Detection
Title（参考訳）: Cognitive-YOLO:オブジェクト検出のための第一原理データからのLLM駆動型アーキテクチャ合成
Authors: Jiahao Zhao,
Abstract要約: 本稿では,Large Language Models (LLM) によるアーキテクチャ合成のための新しいフレームワークであるCognitive-YOLOを提案する。まず、分析モジュールがターゲットデータセットから重要なメタ特徴を抽出する。第2に、LLMはこれらの特徴を理由として、Retrieval-Augmented Generation (RAG)を介して取得した最先端コンポーネントを付加し、アーキテクチャを構造化されたニューラルネットワーク記述言語(NADL)に合成する。第三に、コンパイラは、この記述をデプロイ可能なモデルにインスタンス化する。
参考スコア（独自算出の注目度）: 3.5554162308775408
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Designing high-performance object detection architectures is a complex task, where traditional manual design is time-consuming and labor-intensive, and Neural Architecture Search (NAS) is computationally prohibitive. While recent approaches using Large Language Models (LLMs) show promise, they often function as iterative optimizers within a search loop, rather than generating architectures directly from a holistic understanding of the data. To address this gap, we propose Cognitive-YOLO, a novel framework for LLM-driven architecture synthesis that generates network configurations directly from the intrinsic characteristics of the dataset. Our method consists of three stages: first, an analysis module extracts key meta-features (e.g., object scale distribution and scene density) from the target dataset; second, the LLM reasons upon these features, augmented with state-of-the-art components retrieved via Retrieval-Augmented Generation (RAG), to synthesize the architecture into a structured Neural Architecture Description Language (NADL); finally, a compiler instantiates this description into a deployable model. Extensive experiments on five diverse object detection datasets demonstrate that our proposed Cognitive-YOLO consistently generates superior architectures, achieving highly competitive performance and demonstrating a superior performance-per-parameter trade-off compared to strong baseline models across multiple benchmarks. Crucially, our ablation studies prove that the LLM's data-driven reasoning is the primary driver of performance, demonstrating that a deep understanding of data "first principles" is more critical for achieving a superior architecture than simply retrieving SOTA components.
Abstract（参考訳）: 高性能なオブジェクト検出アーキテクチャの設計は複雑な作業であり、従来の手作業による設計は時間がかかり、労働集約的であり、ニューラルアーキテクチャサーチ(NAS)は計算的に禁止されている。近年のLarge Language Models (LLM) を用いたアプローチは有望であるが、データの全体的理解から直接アーキテクチャを生成するのではなく、探索ループ内で反復最適化として機能することが多い。このギャップに対処するため,本論文では,データセットの固有特性から直接ネットワーク構成を生成するLLM駆動アーキテクチャ合成のための新しいフレームワークであるCognitive-YOLOを提案する。まず、分析モジュールがターゲットデータセットから重要なメタ機能(例えば、オブジェクトのスケール分布とシーン密度)を抽出し、次に、LLMの理由として、Retrieval-Augmented Generation (RAG)を介して取得した最先端コンポーネントを付加して、構造化されたニューラルネットワーク記述言語(NADL)にアーキテクチャを合成し、最後に、コンパイラがこの記述をデプロイ可能なモデルにインスタンス化する。 5つの多様なオブジェクト検出データセットに対する大規模な実験により、提案したCognitive-YOLOは、優れたアーキテクチャを一貫して生成し、高い競争性能を達成し、複数のベンチマークにわたる強力なベースラインモデルと比較して、性能/パラメータ間のトレードオフが優れていることを示した。我々のアブレーション研究は、LLMのデータ駆動推論がパフォーマンスの原動力であることを証明し、データ"第一原則"の深い理解が、単にSOTAコンポーネントを回収するよりも優れたアーキテクチャを実現する上で重要であることを示した。

論文の概要: Cognitive-YOLO: LLM-Driven Architecture Synthesis from First Principles of Data for Object Detection

関連論文リスト