Deep Neural Networks for Visual Reasoning
- URL: http://arxiv.org/abs/2209.11990v1
- Date: Sat, 24 Sep 2022 12:11:00 GMT
- Title: Deep Neural Networks for Visual Reasoning
- Authors: Thao Minh Le
- Abstract summary: It is crucial for machines to have capacity to reason using visual perception and language understanding.
Recent advances in deep learning have built separate sophisticated representations of both visual scenes and languages.
This thesis advances the understanding of how to exploit and use pivotal aspects of vision-and-language tasks with neural networks to support reasoning.
- Score: 12.411844611718958
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual perception and language understanding are - fundamental components of
human intelligence, enabling them to understand and reason about objects and
their interactions. It is crucial for machines to have this capacity to reason
using these two modalities to invent new robot-human collaborative systems.
Recent advances in deep learning have built separate sophisticated
representations of both visual scenes and languages. However, understanding the
associations between the two modalities in a shared context for multimodal
reasoning remains a challenge. Focusing on language and vision modalities, this
thesis advances the understanding of how to exploit and use pivotal aspects of
vision-and-language tasks with neural networks to support reasoning. We derive
these understandings from a series of works, making a two-fold contribution:
(i) effective mechanisms for content selection and construction of temporal
relations from dynamic visual scenes in response to a linguistic query and
preparing adequate knowledge for the reasoning process (ii) new frameworks to
perform reasoning with neural networks by exploiting visual-linguistic
associations, deduced either directly from data or guided by external priors.
Related papers
- From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks [0.0]
We review recent empirical work from machine learning for a broad audience in philosophy, cognitive science, and neuroscience.
In particular, our review emphasizes two approaches to endowing neural networks with compositional generalization capabilities.
We conclude by discussing the implications that these findings may have for the study of compositionality in human cognition.
arXiv Detail & Related papers (2024-05-24T02:36:07Z) - SNeL: A Structured Neuro-Symbolic Language for Entity-Based Multimodal
Scene Understanding [0.0]
We introduce SNeL (Structured Neuro-symbolic Language), a versatile query language designed to facilitate nuanced interactions with neural networks processing multimodal data.
SNeL's expressive interface enables the construction of intricate queries, supporting logical and arithmetic operators, comparators, nesting, and more.
Our evaluations demonstrate SNeL's potential to reshape the way we interact with complex neural networks.
arXiv Detail & Related papers (2023-06-09T17:01:51Z) - Synergistic information supports modality integration and flexible
learning in neural networks solving multiple tasks [107.8565143456161]
We investigate the information processing strategies adopted by simple artificial neural networks performing a variety of cognitive tasks.
Results show that synergy increases as neural networks learn multiple diverse tasks.
randomly turning off neurons during training through dropout increases network redundancy, corresponding to an increase in robustness.
arXiv Detail & Related papers (2022-10-06T15:36:27Z) - Formal Conceptual Views in Neural Networks [0.0]
We introduce two notions for conceptual views of a neural network, specifically a many-valued and a symbolic view.
We test the conceptual expressivity of our novel views through different experiments on the ImageNet and Fruit-360 data sets.
We demonstrate how conceptual views can be applied for abductive learning of human comprehensible rules from neurons.
arXiv Detail & Related papers (2022-09-27T16:38:24Z) - Rethinking Explainability as a Dialogue: A Practitioner's Perspective [57.87089539718344]
We ask doctors, healthcare professionals, and policymakers about their needs and desires for explanations.
Our study indicates that decision-makers would strongly prefer interactive explanations in the form of natural language dialogues.
Considering these needs, we outline a set of five principles researchers should follow when designing interactive explanations.
arXiv Detail & Related papers (2022-02-03T22:17:21Z) - Understanding the computational demands underlying visual reasoning [10.308647202215708]
We systematically assess the ability of modern deep convolutional neural networks to learn to solve visual reasoning problems.
Our analysis leads to a novel taxonomy of visual reasoning tasks, which can be primarily explained by the type of relations and the number of relations used to compose the underlying rules.
arXiv Detail & Related papers (2021-08-08T10:46:53Z) - Constellation: Learning relational abstractions over objects for
compositional imagination [64.99658940906917]
We introduce Constellation, a network that learns relational abstractions of static visual scenes.
This work is a first step in the explicit representation of visual relationships and using them for complex cognitive procedures.
arXiv Detail & Related papers (2021-07-23T11:59:40Z) - Compositional Processing Emerges in Neural Networks Solving Math
Problems [100.80518350845668]
Recent progress in artificial neural networks has shown that when large models are trained on enough linguistic data, grammatical structure emerges in their representations.
We extend this work to the domain of mathematical reasoning, where it is possible to formulate precise hypotheses about how meanings should be composed.
Our work shows that neural networks are not only able to infer something about the structured relationships implicit in their training data, but can also deploy this knowledge to guide the composition of individual meanings into composite wholes.
arXiv Detail & Related papers (2021-05-19T07:24:42Z) - Semantics-Aware Inferential Network for Natural Language Understanding [79.70497178043368]
We propose a Semantics-Aware Inferential Network (SAIN) to meet such a motivation.
Taking explicit contextualized semantics as a complementary input, the inferential module of SAIN enables a series of reasoning steps over semantic clues.
Our model achieves significant improvement on 11 tasks including machine reading comprehension and natural language inference.
arXiv Detail & Related papers (2020-04-28T07:24:43Z) - Learning Intermediate Features of Object Affordances with a
Convolutional Neural Network [1.52292571922932]
We train a deep convolutional neural network (CNN) to recognize affordances from images and to learn the underlying features or the dimensionality of affordances.
We view this representational analysis as the first step towards a more formal account of how humans perceive and interact with the environment.
arXiv Detail & Related papers (2020-02-20T19:04:40Z) - Vision and Language: from Visual Perception to Content Creation [100.36776435627962]
"vision to language" is probably one of the most popular topics in the past five years.
This paper reviews the recent advances along these two dimensions: "vision to language" and "language to vision"
arXiv Detail & Related papers (2019-12-26T14:07:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.