Fugu-MT 論文翻訳(概要): Imaginations of WALL-E : Reconstructing Experiences with an Imagination-Inspired Module for Advanced AI Systems

論文の概要: Imaginations of WALL-E : Reconstructing Experiences with an Imagination-Inspired Module for Advanced AI Systems

arxiv url: http://arxiv.org/abs/2308.10354v1
Date: Sun, 20 Aug 2023 20:10:55 GMT
ステータス: 翻訳完了
システム内更新日: 2023-08-22 16:00:24.192752
Title: Imaginations of WALL-E : Reconstructing Experiences with an Imagination-Inspired Module for Advanced AI Systems
Title（参考訳）: wall-eのイマジネーション : 高度なaiシステムのためのイマジネーションに触発されたモジュールによるリコンストラクション体験
Authors: Zeinab Sadat Taghavi, Soroush Gooran, Seyed Arshan Dalili, Hamidreza Amirzadeh, Mohammad Jalal Nematbakhsh, Hossein Sameti
Abstract要約: 本システムは,テキスト入力と他のモダリティのギャップを埋めるイマジネーションに着想を得たモジュールを備える。これは、人間の解釈と異なるが、等しく有効であるかもしれない概念の独特な解釈につながる。この研究は、想像力に触発されたAIシステムの開発における重要な進歩を表している。
参考スコア（独自算出の注目度）: 2.452498006404167
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In this paper, we introduce a novel Artificial Intelligence (AI) system inspired by the philosophical and psychoanalytical concept of imagination as a ``Re-construction of Experiences". Our AI system is equipped with an imagination-inspired module that bridges the gap between textual inputs and other modalities, enriching the derived information based on previously learned experiences. A unique feature of our system is its ability to formulate independent perceptions of inputs. This leads to unique interpretations of a concept that may differ from human interpretations but are equally valid, a phenomenon we term as ``Interpretable Misunderstanding". We employ large-scale models, specifically a Multimodal Large Language Model (MLLM), enabling our proposed system to extract meaningful information across modalities while primarily remaining unimodal. We evaluated our system against other large language models across multiple tasks, including emotion recognition and question-answering, using a zero-shot methodology to ensure an unbiased scenario that may happen by fine-tuning. Significantly, our system outperformed the best Large Language Models (LLM) on the MELD, IEMOCAP, and CoQA datasets, achieving Weighted F1 (WF1) scores of 46.74%, 25.23%, and Overall F1 (OF1) score of 17%, respectively, compared to 22.89%, 12.28%, and 7% from the well-performing LLM. The goal is to go beyond the statistical view of language processing and tie it to human concepts such as philosophy and psychoanalysis. This work represents a significant advancement in the development of imagination-inspired AI systems, opening new possibilities for AI to generate deep and interpretable information across modalities, thereby enhancing human-AI interaction.
Abstract（参考訳）: In this paper, we introduce a novel Artificial Intelligence (AI) system inspired by the philosophical and psychoanalytical concept of imagination as a ``Re-construction of Experiences". Our AI system is equipped with an imagination-inspired module that bridges the gap between textual inputs and other modalities, enriching the derived information based on previously learned experiences. A unique feature of our system is its ability to formulate independent perceptions of inputs. This leads to unique interpretations of a concept that may differ from human interpretations but are equally valid, a phenomenon we term as ``Interpretable Misunderstanding". 大規模モデル,特にMLLM(Multimodal Large Language Model)を用いて,本システムでは,主に単調なまま,モダリティ間で意味のある情報を抽出することができる。我々は、ゼロショット手法を用いて、感情認識や質問応答など、複数のタスクにまたがる他の大きな言語モデルに対して、システムを評価した。重要な点として,本システムはmeld,iemocap,coqaのデータセットで最高の大規模言語モデル(llm)を上回っており,重み付きf1(wf1)スコアが46.74%,25.23%,全体f1(of1)スコアが17%であった。目標は、言語処理の統計的見解を超えて、哲学や精神分析のような人間の概念に結びつけることである。この研究は、イマジネーションにインスパイアされたAIシステムの開発における重要な進歩であり、AIがモジュール間の深い解釈可能な情報を生成する新たな可能性を開き、それによって人間とAIの相互作用が向上する。

関連論文リスト

Knowledge Conceptualization Impacts RAG Efficacy [0.0786430477112975]
本稿では,伝達可能な,解釈可能なニューロシンボリックAIシステムの設計について検討する。具体的には、'Agentic Retrieval-Augmented Generation'システムと呼ばれるシステムのクラスに焦点を当てる。
論文参考訳（メタデータ） (2025-07-12T20:10:26Z)
Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models [10.1080193179562]
現在の理解モデルは「何」を認識するのに優れているが、因果推論や将来の予測のような高いレベルの認知タスクでは不足している。本稿では,知識駆動型推論コアとして機能するLarge Language Model (LLM)を用いて,視覚の深層認識のための強力なビジョン基礎モデルと融合する新しいフレームワークを提案する。
論文参考訳（メタデータ） (2025-07-08T09:43:17Z)
Large Concept Models: Language Modeling in a Sentence Representation Space [62.73366944266477]
本稿では,概念を命名した明示的な高レベルな意味表現に基づくアーキテクチャの試みを行う。概念は言語とモダリティに依存しないものであり、フローにおけるより高いレベルの考えや行動を表している。本モデルでは,多くの言語に対して,ゼロショットの一般化性能が顕著であることを示す。
論文参考訳（メタデータ） (2024-12-11T23:36:20Z)
ARPA: A Novel Hybrid Model for Advancing Visual Word Disambiguation Using Large Language Models and Transformers [1.6541870997607049]
変換器の高度な特徴抽出機能を備えた大規模言語モデルの非並列的文脈理解を融合したアーキテクチャであるARPAを提案する。 ARPAの導入は、視覚的単語の曖昧さにおいて重要なマイルストーンであり、魅力的なソリューションを提供する。我々は研究者や実践者たちに、このようなハイブリッドモデルが人工知能の先例のない進歩を後押しする未来を想像して、我々のモデルの能力を探求するよう依頼する。
論文参考訳（メタデータ） (2024-08-12T10:15:13Z)
LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models [50.259006481656094]
本稿では,大規模視覚言語モデルの内部メカニズムの理解を目的とした対話型アプリケーションを提案する。このインタフェースは, 画像パッチの解釈可能性を高めるために設計されており, 応答の生成に有効である。本稿では,一般的な大規模マルチモーダルモデルであるLLaVAにおける障害機構の理解に,アプリケーションがどのように役立つかのケーススタディを示す。
論文参考訳（メタデータ） (2024-04-03T23:57:34Z)
Position Paper: Agent AI Towards a Holistic Intelligence [53.35971598180146]
エージェントAI - 大きな基盤モデルをエージェントアクションに統合する具体的システム。本稿では,エージェント・ファウンデーション・モデル(エージェント・ファウンデーション・モデル)を提案する。
論文参考訳（メタデータ） (2024-02-28T16:09:56Z)
MMToM-QA: Multimodal Theory of Mind Question Answering [80.87550820953236]
心の理論 (ToM) は人間レベルの社会知能を持つ機械を開発する上で不可欠な要素である。最近の機械学習モデル、特に大きな言語モデルは、ToM理解のいくつかの側面を示しているようだ。一方、ヒューマンToMはビデオやテキストの理解以上のものです。人は、利用可能なデータから抽出された概念的表現に基づいて、他人の心について柔軟に推論することができる。
論文参考訳（メタデータ） (2024-01-16T18:59:24Z)
Neurosymbolic Value-Inspired AI (Why, What, and How) [8.946847190099206]
本稿では,VAI(Value-Inspired AI)というニューロシンボリック・コンピューティング・フレームワークを提案する。 VAIは、人間の価値の様々な次元を表現し、統合することを目的としている。我々は、この方向における現在の進歩についての洞察を提供し、この分野の将来的な方向性について概説する。
論文参考訳（メタデータ） (2023-12-15T16:33:57Z)
Building Trust in Conversational AI: A Comprehensive Review and Solution Architecture for Explainable, Privacy-Aware Systems using LLMs and Knowledge Graph [0.33554367023486936]
我々は150以上の大規模言語モデル(LLM)の詳細なレビューを提供する包括的ツールを紹介する。本稿では,LLMの言語機能と知識グラフの構造的ダイナミクスをシームレスに統合する機能的アーキテクチャを提案する。我々のアーキテクチャは言語学の洗練と実情の厳密さを巧みにブレンドし、ロールベースアクセス制御によるデータセキュリティをさらに強化する。
論文参考訳（メタデータ） (2023-08-13T22:47:51Z)
DIME: Fine-grained Interpretations of Multimodal Models via Disentangled Local Explanations [119.1953397679783]
我々は,マルチモーダルモデルの解釈における最先端化に注力する。提案手法であるDIMEは,マルチモーダルモデルの高精度かつきめ細かな解析を可能にする。
論文参考訳（メタデータ） (2022-03-03T20:52:47Z)
WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model [74.4875156387271]
我々は,膨大なマルチモーダル(視覚的・テキスト的)データを事前学習した新しい基礎モデルを開発する。そこで本研究では,様々な下流タスクにおいて,最先端の成果が得られることを示す。
論文参考訳（メタデータ） (2021-10-27T12:25:21Z)
Conceptual Modeling and Artificial Intelligence: Mutual Benefits from Complementary Worlds [0.0]
これまでのところ、主に分離されたCMとAIの分野にアプローチする2つの交差点に取り組むことに興味があります。このワークショップでは、(一)概念モデリング(CM)がAIにどのような貢献ができるのか、(一)その逆の方法で、多様体相互利益を実現することができるという仮定を取り入れている。
論文参考訳（メタデータ） (2021-10-16T18:42:09Z)
Distributed and Democratized Learning: Philosophy and Research Challenges [80.39805582015133]
民主化学習(Dem-AI)という新しいデザイン哲学を提案する。ヒトの社会的グループに触発され、提案されたDem-AIシステムの学習エージェントの専門グループは階層構造で自己組織化され、より効率的に学習タスクを遂行する。本稿では,様々な学際分野に触発された未来のDem-AIシステムを実現するためのガイドラインとして,参照設計を提案する。
論文参考訳（メタデータ） (2020-03-18T08:45:10Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。