Brainish: Formalizing A Multimodal Language for Intelligence and
Consciousness
- URL: http://arxiv.org/abs/2205.00001v2
- Date: Tue, 3 May 2022 21:08:15 GMT
- Title: Brainish: Formalizing A Multimodal Language for Intelligence and
Consciousness
- Authors: Paul Pu Liang
- Abstract summary: We describe the desiderata of a multimodal language called Brainish.
Brainish consists of words, images, audio, and sensations combined in representations that the Conscious Turing Machine's processors use to communicate.
- Score: 23.86633372513335
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Having a rich multimodal inner language is an important component of human
intelligence that enables several necessary core cognitive functions such as
multimodal prediction, translation, and generation. Building upon the Conscious
Turing Machine (CTM), a machine model for consciousness proposed by Blum and
Blum (2021), we describe the desiderata of a multimodal language called
Brainish, comprising words, images, audio, and sensations combined in
representations that the CTM's processors use to communicate with each other.
We define the syntax and semantics of Brainish before operationalizing this
language through the lens of multimodal artificial intelligence, a vibrant
research area studying the computational tools necessary for processing and
relating information from heterogeneous signals. Our general framework for
learning Brainish involves designing (1) unimodal encoders to segment and
represent unimodal data, (2) a coordinated representation space that relates
and composes unimodal features to derive holistic meaning across multimodal
inputs, and (3) decoders to map multimodal representations into predictions
(for fusion) or raw data (for translation or generation). Through discussing
how Brainish is crucial for communication and coordination in order to achieve
consciousness in the CTM, and by implementing a simple version of Brainish and
evaluating its capability of demonstrating intelligence on multimodal
prediction and retrieval tasks on several real-world image, text, and audio
datasets, we argue that such an inner language will be important for advances
in machine models of intelligence and consciousness.
Related papers
- MindSemantix: Deciphering Brain Visual Experiences with a Brain-Language Model [45.18716166499859]
Deciphering the human visual experience through brain activities captured by fMRI represents a compelling and cutting-edge challenge.
We introduce MindSemantix, a novel multi-modal framework that enables LLMs to comprehend visually-evoked semantic content in brain activity.
MindSemantix generates high-quality captions that are deeply rooted in the visual and semantic information derived from brain activity.
arXiv Detail & Related papers (2024-05-29T06:55:03Z) - Neuro-Vision to Language: Enhancing Brain Recording-based Visual Reconstruction and Language Interaction [8.63068449082585]
Decoding non-invasive brain recordings is pivotal for advancing our understanding of human cognition.
Our framework integrates 3D brain structures with visual semantics using a Vision Transformer 3D.
We have enhanced the fMRI dataset with diverse fMRI-image-related textual data to support multimodal large model development.
arXiv Detail & Related papers (2024-04-30T10:41:23Z) - MMToM-QA: Multimodal Theory of Mind Question Answering [80.87550820953236]
Theory of Mind (ToM) is an essential ingredient for developing machines with human-level social intelligence.
Recent machine learning models, particularly large language models, seem to show some aspects of ToM understanding.
Human ToM, on the other hand, is more than video or text understanding.
People can flexibly reason about another person's mind based on conceptual representations extracted from any available data.
arXiv Detail & Related papers (2024-01-16T18:59:24Z) - Brain-Conditional Multimodal Synthesis: A Survey and Taxonomy [18.130004804879896]
Key to multimodal synthesis technology is to establish the mapping relationship between different modalities.
Brian-conditional multimodal synthesis refers to decoding brain signals back to perceptual experience.
This survey comprehensively examines the emerging field of AIGC-based Brain-conditional Multimodal Synthesis, termed AIGC-Brain.
arXiv Detail & Related papers (2023-12-31T09:00:40Z) - Brain encoding models based on multimodal transformers can transfer
across language and vision [60.72020004771044]
We used representations from multimodal transformers to train encoding models that can transfer across fMRI responses to stories and movies.
We found that encoding models trained on brain responses to one modality can successfully predict brain responses to the other modality.
arXiv Detail & Related papers (2023-05-20T17:38:44Z) - Language Is Not All You Need: Aligning Perception with Language Models [110.51362453720458]
We introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context, and follow instructions.
We train Kosmos-1 from scratch on web-scale multimodal corpora, including arbitrarily interleaved text and images, image-caption pairs, and text data.
Experimental results show that Kosmos-1 achieves impressive performance on (i) language understanding, generation, and even OCR-free NLP.
We also show that MLLMs can benefit from cross-modal transfer, i.e., transfer knowledge from language to multimodal, and from multimodal to language
arXiv Detail & Related papers (2023-02-27T18:55:27Z) - Decoding Visual Neural Representations by Multimodal Learning of
Brain-Visual-Linguistic Features [9.783560855840602]
This paper presents a generic neural decoding method called BraVL that uses multimodal learning of brain-visual-linguistic features.
We focus on modeling the relationships between brain, visual and linguistic features via multimodal deep generative models.
In particular, our BraVL model can be trained under various semi-supervised scenarios to incorporate the visual and textual features obtained from the extra categories.
arXiv Detail & Related papers (2022-10-13T05:49:33Z) - Multimodal foundation models are better simulators of the human brain [65.10501322822881]
We present a newly-designed multimodal foundation model pre-trained on 15 million image-text pairs.
We find that both visual and lingual encoders trained multimodally are more brain-like compared with unimodal ones.
arXiv Detail & Related papers (2022-08-17T12:36:26Z) - Toward a realistic model of speech processing in the brain with
self-supervised learning [67.7130239674153]
Self-supervised algorithms trained on the raw waveform constitute a promising candidate.
We show that Wav2Vec 2.0 learns brain-like representations with as little as 600 hours of unlabelled speech.
arXiv Detail & Related papers (2022-06-03T17:01:46Z) - Emergence of Machine Language: Towards Symbolic Intelligence with Neural
Networks [73.94290462239061]
We propose to combine symbolism and connectionism principles by using neural networks to derive a discrete representation.
By designing an interactive environment and task, we demonstrated that machines could generate a spontaneous, flexible, and semantic language.
arXiv Detail & Related papers (2022-01-14T14:54:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.