Motion Mapping Cognition: A Nondecomposable Primary Process in Human
Vision
- URL: http://arxiv.org/abs/2402.04275v1
- Date: Fri, 2 Feb 2024 10:11:25 GMT
- Title: Motion Mapping Cognition: A Nondecomposable Primary Process in Human
Vision
- Authors: Zhenping Xie
- Abstract summary: I present a basic cognitive process, motion mapping cognition (MMC), which should be a nondecomposable primary function in human vision.
MMC can be used to explain most of human visual functions in fundamental, but can not be effectively modelled by traditional visual processing ways.
I state that MMC may be looked as an extension of Chen's theory of topological perception on human vision, and seems to be unsolvable using existing intelligent algorithm skills.
- Score: 2.7195102129095003
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human intelligence seems so mysterious that we have not successfully
understood its foundation until now. Here, I want to present a basic cognitive
process, motion mapping cognition (MMC), which should be a nondecomposable
primary function in human vision. Wherein, I point out that, MMC process can be
used to explain most of human visual functions in fundamental, but can not be
effectively modelled by traditional visual processing ways including image
segmentation, object recognition, object tracking etc. Furthermore, I state
that MMC may be looked as an extension of Chen's theory of topological
perception on human vision, and seems to be unsolvable using existing
intelligent algorithm skills. Finally, along with the requirements of MMC
problem, an interesting computational model, quantized topological matching
principle can be derived by developing the idea of optimal transport theory.
Above results may give us huge inspiration to develop more robust and
interpretable machine vision models.
Related papers
- Coding for Intelligence from the Perspective of Category [66.14012258680992]
Coding targets compressing and reconstructing data, and intelligence.
Recent trends demonstrate the potential homogeneity of these two fields.
We propose a novel problem of Coding for Intelligence from the category theory view.
arXiv Detail & Related papers (2024-07-01T07:05:44Z) - Cantor: Inspiring Multimodal Chain-of-Thought of MLLM [83.6663322930814]
We argue that converging visual context acquisition and logical reasoning is pivotal for tackling visual reasoning tasks.
We propose an innovative multimodal CoT framework, termed Cantor, characterized by a perception-decision architecture.
Our experiments demonstrate the efficacy of the proposed framework, showing significant improvements in multimodal CoT performance.
arXiv Detail & Related papers (2024-04-24T17:59:48Z) - Understanding Multimodal Deep Neural Networks: A Concept Selection View [29.08342307127578]
Concept-based models map the black-box visual representations extracted by deep neural networks onto a set of human-understandable concepts.
We propose a two-stage Concept Selection Model (CSM) to mine core concepts without introducing any human priors.
Our approach achieves comparable performance to end-to-end black-box models.
arXiv Detail & Related papers (2024-04-13T11:06:49Z) - Solving the Clustering Reasoning Problems by Modeling a Deep-Learning-Based Probabilistic Model [1.7955614278088239]
We introduce PMoC, a deep-learning-based probabilistic model, achieving high reasoning accuracy in the Bongard-Logo.
As a bonus, we also designed Pose-Transformer for complex visual abstract reasoning tasks.
arXiv Detail & Related papers (2024-03-05T18:08:29Z) - Think Twice: Perspective-Taking Improves Large Language Models'
Theory-of-Mind Capabilities [63.90227161974381]
SimToM is a novel prompting framework inspired by Simulation Theory's notion of perspective-taking.
Our approach, which requires no additional training and minimal prompt-tuning, shows substantial improvement over existing methods.
arXiv Detail & Related papers (2023-11-16T22:49:27Z) - Latent Emission-Augmented Perspective-Taking (LEAPT) for Human-Robot
Interaction [16.19711863900126]
We present a deep world model that enables a robot to perform both perception and conceptual perspective taking.
The key innovation is a multi-modal latent state space model able to generate and augment fictitious observations/emissions.
We tasked our model to predict human observations and beliefs on three partially-observable HRI tasks.
arXiv Detail & Related papers (2023-08-12T08:22:11Z) - Minding Language Models' (Lack of) Theory of Mind: A Plug-and-Play
Multi-Character Belief Tracker [72.09076317574238]
ToM is a plug-and-play approach to investigate the belief states of characters in reading comprehension.
We show that ToM enhances off-the-shelf neural network theory mind in a zero-order setting while showing robust out-of-distribution performance compared to supervised baselines.
arXiv Detail & Related papers (2023-06-01T17:24:35Z) - Zero-shot visual reasoning through probabilistic analogical mapping [2.049767929976436]
We present visiPAM (visual Probabilistic Analogical Mapping), a model of visual reasoning that synthesizes two approaches.
We show that without any direct training, visiPAM outperforms a state-of-the-art deep learning model on an analogical mapping task.
In addition, visiPAM closely matches the pattern of human performance on a novel task involving mapping of 3D objects across disparate categories.
arXiv Detail & Related papers (2022-09-29T20:29:26Z) - CX-ToM: Counterfactual Explanations with Theory-of-Mind for Enhancing
Human Trust in Image Recognition Models [84.32751938563426]
We propose a new explainable AI (XAI) framework for explaining decisions made by a deep convolutional neural network (CNN)
In contrast to the current methods in XAI that generate explanations as a single shot response, we pose explanation as an iterative communication process.
Our framework generates sequence of explanations in a dialog by mediating the differences between the minds of machine and human user.
arXiv Detail & Related papers (2021-09-03T09:46:20Z) - Deep Interpretable Models of Theory of Mind For Human-Agent Teaming [0.7734726150561086]
We develop an interpretable modular neural framework for modeling the intentions of other observed entities.
We demonstrate the efficacy of our approach with experiments on data from human participants on a search and rescue task in Minecraft.
arXiv Detail & Related papers (2021-04-07T06:18:58Z) - Interpretable Visual Reasoning via Induced Symbolic Space [75.95241948390472]
We study the problem of concept induction in visual reasoning, i.e., identifying concepts and their hierarchical relationships from question-answer pairs associated with images.
We first design a new framework named object-centric compositional attention model (OCCAM) to perform the visual reasoning task with object-level visual features.
We then come up with a method to induce concepts of objects and relations using clues from the attention patterns between objects' visual features and question words.
arXiv Detail & Related papers (2020-11-23T18:21:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.