Fugu-MT 論文翻訳(概要): On the Evaluation of Engineering Artificial General Intelligence

論文の概要: On the Evaluation of Engineering Artificial General Intelligence

arxiv url: http://arxiv.org/abs/2505.10653v1
Date: Thu, 15 May 2025 18:52:47 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-19 14:36:13.460578
Title: On the Evaluation of Engineering Artificial General Intelligence
Title（参考訳）: エンジニアリング・ジェネラル・インテリジェンスの評価について
Authors: Sandeep Neema, Susmit Jha, Adam Nagel, Ethan Lew, Chandrasekar Sureshkumar, Aleksa Gordic, Chase Shimmin, Hieu Nguygen, Paul Eremenko,
Abstract要約: 本稿では,工学的汎用人工知能(eAGI)エージェントを評価するための枠組みを提案する。我々はeAGIを人工知能(AGI)の専門化と考えている。 eAGIエージェントは、事実とメソッドの背景知識(リコールと検索)のユニークなブレンドを持つべきである。
参考スコア（独自算出の注目度）: 5.802869598386355
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We discuss the challenges and propose a framework for evaluating engineering artificial general intelligence (eAGI) agents. We consider eAGI as a specialization of artificial general intelligence (AGI), deemed capable of addressing a broad range of problems in the engineering of physical systems and associated controllers. We exclude software engineering for a tractable scoping of eAGI and expect dedicated software engineering AI agents to address the software implementation challenges. Similar to human engineers, eAGI agents should possess a unique blend of background knowledge (recall and retrieve) of facts and methods, demonstrate familiarity with tools and processes, exhibit deep understanding of industrial components and well-known design families, and be able to engage in creative problem solving (analyze and synthesize), transferring ideas acquired in one context to another. Given this broad mandate, evaluating and qualifying the performance of eAGI agents is a challenge in itself and, arguably, a critical enabler to developing eAGI agents. In this paper, we address this challenge by proposing an extensible evaluation framework that specializes and grounds Bloom's taxonomy - a framework for evaluating human learning that has also been recently used for evaluating LLMs - in an engineering design context. Our proposed framework advances the state of the art in benchmarking and evaluation of AI agents in terms of the following: (a) developing a rich taxonomy of evaluation questions spanning from methodological knowledge to real-world design problems; (b) motivating a pluggable evaluation framework that can evaluate not only textual responses but also evaluate structured design artifacts such as CAD models and SysML models; and (c) outlining an automatable procedure to customize the evaluation benchmark to different engineering contexts.
Abstract（参考訳）: 本稿では,工学的汎用人工知能(eAGI)エージェントを評価するための枠組みを提案する。我々はeAGIを人工知能(AGI)の専門化とみなし、物理的システムと関連するコントローラの工学における幅広い問題に対処できると考えている。 eAGIの抽出可能なスクーピングのためにソフトウェアエンジニアリングを除外し、ソフトウェア実装の課題に対処するために専用のソフトウェアエンジニアリングAIエージェントを期待する。ヒューマンエンジニアと同様に、eAGIエージェントは、事実と方法の独自の背景知識(リコールと検索)を持ち、ツールやプロセスに精通し、工業的コンポーネントとよく知られたデザインファミリーの深い理解を示し、創造的な問題解決(分析と合成)に携わることができ、あるコンテキストで得られたアイデアを別のコンテキストに転送することができる。 eAGIエージェントの性能の評価と評価は、この広範な義務を考慮に入れれば、それ自体が課題であり、間違いなく、eAGIエージェントを開発するための重要なイネーブラーである。本稿では,Bloomの分類を専門とする拡張性評価フレームワークを提案することにより,この課題に対処する。提案するフレームワークは,次のような観点からAIエージェントのベンチマークと評価の最先端を推し進めている。 (a)方法論的知識から現実世界の設計問題に至るまでの評価問題に関する豊富な分類法を開発すること。 b)テキスト応答だけでなくCADモデルやSysMLモデルなどの構造化デザインアーティファクトも評価できるプラグイン可能な評価フレームワークのモチベーション (c) 評価ベンチマークを異なるエンジニアリングコンテキストにカスタマイズするための自動化可能な手順の概要。

関連論文リスト

The AI Imperative: Scaling High-Quality Peer Review in Machine Learning [49.87236114682497]
AIによるピアレビューは、緊急の研究とインフラの優先事項になるべきだ、と私たちは主張する。我々は、事実検証の強化、レビュアーのパフォーマンスの指導、品質改善における著者の支援、意思決定におけるAC支援におけるAIの具体的な役割を提案する。
論文参考訳（メタデータ） (2025-06-09T18:37:14Z)
Rethinking Machine Unlearning in Image Generation Models [59.697750585491264]
CatIGMUは、新しい階層的なタスク分類フレームワークである。 EvalIGMUは包括的な評価フレームワークである。高品質な未学習データセットであるDataIGMを構築した。
論文参考訳（メタデータ） (2025-06-03T11:25:14Z)
Computational Safety for Generative AI: A Signal Processing Perspective [65.268245109828]
計算安全性は、GenAIにおける安全性の定量的評価、定式化、研究を可能にする数学的枠組みである。ジェイルブレイクによる悪意のあるプロンプトを検出するために, 感度解析と損失景観解析がいかに有効かを示す。我々は、AIの安全性における信号処理の鍵となる研究課題、機会、そして重要な役割について論じる。
論文参考訳（メタデータ） (2025-02-18T02:26:50Z)
Work in Progress: AI-Powered Engineering-Bridging Theory and Practice [0.0]
本稿では,システム工学の重要なステップを自動化し,改善する上で,生成AIがいかに役立つかを考察する。 INCOSEの"よい要件"基準に基づいて、システム要件を分析するAIの能力を調べる。この研究は、エンジニアリングプロセスを合理化し、学習結果を改善するAIの可能性を評価することを目的としている。
論文参考訳（メタデータ） (2025-02-06T17:42:00Z)
Data Analysis in the Era of Generative AI [56.44807642944589]
本稿では,AIを活用したデータ分析ツールの可能性について考察する。我々は、大規模言語とマルチモーダルモデルの出現が、データ分析ワークフローの様々な段階を強化する新しい機会を提供する方法について検討する。次に、直感的なインタラクションを促進し、ユーザ信頼を構築し、AI支援分析ワークフローを複数のアプリにわたって合理化するための、人間中心の設計原則を調べます。
論文参考訳（メタデータ） (2024-09-27T06:31:03Z)
How critically can an AI think? A framework for evaluating the quality of thinking of generative artificial intelligence [0.9671462473115854]
大きな言語モデルを持つような生成AIは、革新的なアセスメント設計プラクティスの機会を生み出している。本稿では,現在の業界ベンチマークである LLM ChatGPT4 アプリケーションの性能を探求するフレームワークを提案する。この批判は、批判的思考スキルの観点から、彼らの質問の脆弱性を具体的かつターゲットに示します。
論文参考訳（メタデータ） (2024-06-20T22:46:56Z)
What Does Evaluation of Explainable Artificial Intelligence Actually Tell Us? A Case for Compositional and Contextual Validation of XAI Building Blocks [16.795332276080888]
本稿では,説明可能な人工知能システムのためのきめ細かい検証フレームワークを提案する。技術的ビルディングブロック,ユーザによる説明的成果物,ソーシャルコミュニケーションプロトコルといった,モジュール構造の本質を認識します。
論文参考訳（メタデータ） (2024-03-19T13:45:34Z)
Levels of AGI for Operationalizing Progress on the Path to AGI [64.59151650272477]
本稿では,人工知能(AGI)モデルとその前駆体の性能と動作を分類する枠組みを提案する。このフレームワークは、AGIのパフォーマンス、一般性、自律性のレベルを導入し、モデルを比較し、リスクを評価し、AGIへの道筋に沿って進捗を測定する共通の言語を提供する。
論文参考訳（メタデータ） (2023-11-04T17:44:58Z)
Requirements Engineering Framework for Human-centered Artificial Intelligence Software Systems [9.642259026572175]
我々は、人間中心AIガイドラインとユーザーサーベイに基づいて、人間中心AIベースのソフトウェアに対する要件収集を支援する新しいフレームワークを提案する。本フレームワークは,仮想現実(VR)ユーザを対象とした360度ビデオの品質向上に必要な要件を抽出し,モデル化するために,ケーススタディに適用される。
論文参考訳（メタデータ） (2023-03-06T06:37:50Z)
An interdisciplinary conceptual study of Artificial Intelligence (AI) for helping benefit-risk assessment practices: Towards a comprehensive qualification matrix of AI programs and devices (pre-print 2020) [55.41644538483948]
本稿では,インテリジェンスの概念に対処するさまざまな分野の既存の概念を包括的に分析する。目的は、AIシステムを評価するための共有概念や相違点を特定することである。
論文参考訳（メタデータ） (2021-05-07T12:01:31Z)
Synergizing Domain Expertise with Self-Awareness in Software Systems: A Patternized Architecture Guideline [11.155059219430207]
本稿では、ソフトウェアシステムにおける自己適応性を高めるために、ドメインの専門知識の相乗化と自己認識の重要性を強調する。我々は、DBASESと呼ばれる概念、豊富なパターン、方法論の総合的なフレームワークを提示し、エンジニアに原則化されたガイドラインを提供する。
論文参考訳（メタデータ） (2020-01-20T12:17:22Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。