Imaginations of WALL-E : Reconstructing Experiences with an
Imagination-Inspired Module for Advanced AI Systems
- URL: http://arxiv.org/abs/2308.10354v1
- Date: Sun, 20 Aug 2023 20:10:55 GMT
- Title: Imaginations of WALL-E : Reconstructing Experiences with an
Imagination-Inspired Module for Advanced AI Systems
- Authors: Zeinab Sadat Taghavi, Soroush Gooran, Seyed Arshan Dalili, Hamidreza
Amirzadeh, Mohammad Jalal Nematbakhsh, Hossein Sameti
- Abstract summary: Our system is equipped with an imagination-inspired module that bridges the gap between textual inputs and other modalities.
This leads to unique interpretations of a concept that may differ from human interpretations but are equally valid.
This work represents a significant advancement in the development of imagination-inspired AI systems.
- Score: 2.452498006404167
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper, we introduce a novel Artificial Intelligence (AI) system
inspired by the philosophical and psychoanalytical concept of imagination as a
``Re-construction of Experiences". Our AI system is equipped with an
imagination-inspired module that bridges the gap between textual inputs and
other modalities, enriching the derived information based on previously learned
experiences. A unique feature of our system is its ability to formulate
independent perceptions of inputs. This leads to unique interpretations of a
concept that may differ from human interpretations but are equally valid, a
phenomenon we term as ``Interpretable Misunderstanding". We employ large-scale
models, specifically a Multimodal Large Language Model (MLLM), enabling our
proposed system to extract meaningful information across modalities while
primarily remaining unimodal. We evaluated our system against other large
language models across multiple tasks, including emotion recognition and
question-answering, using a zero-shot methodology to ensure an unbiased
scenario that may happen by fine-tuning. Significantly, our system outperformed
the best Large Language Models (LLM) on the MELD, IEMOCAP, and CoQA datasets,
achieving Weighted F1 (WF1) scores of 46.74%, 25.23%, and Overall F1 (OF1)
score of 17%, respectively, compared to 22.89%, 12.28%, and 7% from the
well-performing LLM. The goal is to go beyond the statistical view of language
processing and tie it to human concepts such as philosophy and psychoanalysis.
This work represents a significant advancement in the development of
imagination-inspired AI systems, opening new possibilities for AI to generate
deep and interpretable information across modalities, thereby enhancing
human-AI interaction.
Related papers
- ARPA: A Novel Hybrid Model for Advancing Visual Word Disambiguation Using Large Language Models and Transformers [1.6541870997607049]
We present ARPA, an architecture that fuses the unparalleled contextual understanding of large language models with the advanced feature extraction capabilities of transformers.
ARPA's introduction marks a significant milestone in visual word disambiguation, offering a compelling solution.
We invite researchers and practitioners to explore the capabilities of our model, envisioning a future where such hybrid models drive unprecedented advancements in artificial intelligence.
arXiv Detail & Related papers (2024-08-12T10:15:13Z) - LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models [50.259006481656094]
We present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models.
Our interface is designed to enhance the interpretability of the image patches, which are instrumental in generating an answer.
We present a case study of how our application can aid in understanding failure mechanisms in a popular large multi-modal model: LLaVA.
arXiv Detail & Related papers (2024-04-03T23:57:34Z) - Position Paper: Agent AI Towards a Holistic Intelligence [53.35971598180146]
We emphasize developing Agent AI -- an embodied system that integrates large foundation models into agent actions.
In this paper, we propose a novel large action model to achieve embodied intelligent behavior, the Agent Foundation Model.
arXiv Detail & Related papers (2024-02-28T16:09:56Z) - MMToM-QA: Multimodal Theory of Mind Question Answering [80.87550820953236]
Theory of Mind (ToM) is an essential ingredient for developing machines with human-level social intelligence.
Recent machine learning models, particularly large language models, seem to show some aspects of ToM understanding.
Human ToM, on the other hand, is more than video or text understanding.
People can flexibly reason about another person's mind based on conceptual representations extracted from any available data.
arXiv Detail & Related papers (2024-01-16T18:59:24Z) - Neurosymbolic Value-Inspired AI (Why, What, and How) [8.946847190099206]
We propose a neurosymbolic computational framework called Value-Inspired AI (VAI)
VAI aims to represent and integrate various dimensions of human values.
We offer insights into the current progress made in this direction and outline potential future directions for the field.
arXiv Detail & Related papers (2023-12-15T16:33:57Z) - Building Trust in Conversational AI: A Comprehensive Review and Solution
Architecture for Explainable, Privacy-Aware Systems using LLMs and Knowledge
Graph [0.33554367023486936]
We introduce a comprehensive tool that provides an in-depth review of over 150 Large Language Models (LLMs)
Building on this foundation, we propose a novel functional architecture that seamlessly integrates the structured dynamics of Knowledge Graphs with the linguistic capabilities of LLMs.
Our architecture adeptly blends linguistic sophistication with factual rigour and further strengthens data security through Role-Based Access Control.
arXiv Detail & Related papers (2023-08-13T22:47:51Z) - DIME: Fine-grained Interpretations of Multimodal Models via Disentangled
Local Explanations [119.1953397679783]
We focus on advancing the state-of-the-art in interpreting multimodal models.
Our proposed approach, DIME, enables accurate and fine-grained analysis of multimodal models.
arXiv Detail & Related papers (2022-03-03T20:52:47Z) - WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model [74.4875156387271]
We develop a novel foundation model pre-trained with huge multimodal (visual and textual) data.
We show that state-of-the-art results can be obtained on a wide range of downstream tasks.
arXiv Detail & Related papers (2021-10-27T12:25:21Z) - Conceptual Modeling and Artificial Intelligence: Mutual Benefits from
Complementary Worlds [0.0]
We are interested in tackling the intersection of the two, thus far, mostly isolated approached disciplines of CM and AI.
The workshop embraces the assumption, that manifold mutual benefits can be realized by i) investigating what Conceptual Modeling (CM) can contribute to AI, and ii) the other way around.
arXiv Detail & Related papers (2021-10-16T18:42:09Z) - Distributed and Democratized Learning: Philosophy and Research
Challenges [80.39805582015133]
We propose a novel design philosophy called democratized learning (Dem-AI)
Inspired by the societal groups of humans, the specialized groups of learning agents in the proposed Dem-AI system are self-organized in a hierarchical structure to collectively perform learning tasks more efficiently.
We present a reference design as a guideline to realize future Dem-AI systems, inspired by various interdisciplinary fields.
arXiv Detail & Related papers (2020-03-18T08:45:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.