Causal Reasoning Meets Visual Representation Learning: A Prospective
Study
- URL: http://arxiv.org/abs/2204.12037v8
- Date: Wed, 22 Mar 2023 02:41:58 GMT
- Title: Causal Reasoning Meets Visual Representation Learning: A Prospective
Study
- Authors: Yang Liu, Yushen Wei, Hong Yan, Guanbin Li, Liang Lin
- Abstract summary: Lack of interpretability, robustness, and out-of-distribution generalization are becoming the challenges of the existing visual models.
Inspired by the strong inference ability of human-level agents, recent years have witnessed great effort in developing causal reasoning paradigms.
This paper aims to provide a comprehensive overview of this emerging field, attract attention, encourage discussions, bring to the forefront the urgency of developing novel causal reasoning methods.
- Score: 117.08431221482638
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual representation learning is ubiquitous in various real-world
applications, including visual comprehension, video understanding, multi-modal
analysis, human-computer interaction, and urban computing. Due to the emergence
of huge amounts of multi-modal heterogeneous spatial/temporal/spatial-temporal
data in big data era, the lack of interpretability, robustness, and
out-of-distribution generalization are becoming the challenges of the existing
visual models. The majority of the existing methods tend to fit the original
data/variable distributions and ignore the essential causal relations behind
the multi-modal knowledge, which lacks unified guidance and analysis about why
modern visual representation learning methods easily collapse into data bias
and have limited generalization and cognitive abilities. Inspired by the strong
inference ability of human-level agents, recent years have therefore witnessed
great effort in developing causal reasoning paradigms to realize robust
representation and model learning with good cognitive ability. In this paper,
we conduct a comprehensive review of existing causal reasoning methods for
visual representation learning, covering fundamental theories, models, and
datasets. The limitations of current methods and datasets are also discussed.
Moreover, we propose some prospective challenges, opportunities, and future
research directions for benchmarking causal reasoning algorithms in visual
representation learning. This paper aims to provide a comprehensive overview of
this emerging field, attract attention, encourage discussions, bring to the
forefront the urgency of developing novel causal reasoning methods, publicly
available benchmarks, and consensus-building standards for reliable visual
representation learning and related real-world applications more efficiently.
Related papers
- Coding for Intelligence from the Perspective of Category [66.14012258680992]
Coding targets compressing and reconstructing data, and intelligence.
Recent trends demonstrate the potential homogeneity of these two fields.
We propose a novel problem of Coding for Intelligence from the category theory view.
arXiv Detail & Related papers (2024-07-01T07:05:44Z) - Heterogeneous Contrastive Learning for Foundation Models and Beyond [73.74745053250619]
In the era of big data and Artificial Intelligence, an emerging paradigm is to utilize contrastive self-supervised learning to model large-scale heterogeneous data.
This survey critically evaluates the current landscape of heterogeneous contrastive learning for foundation models.
arXiv Detail & Related papers (2024-03-30T02:55:49Z) - Causality-based Cross-Modal Representation Learning for
Vision-and-Language Navigation [15.058687283978077]
Vision-and-Language Navigation (VLN) has gained significant research interest in recent years due to its potential applications in real-world scenarios.
Existing VLN methods struggle with the issue of spurious associations, resulting in poor generalization with a significant performance gap between seen and unseen environments.
We propose a unified framework CausalVLN based on the causal learning paradigm to train a robust navigator capable of learning unbiased feature representations.
arXiv Detail & Related papers (2024-03-06T02:01:38Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - Looking deeper into interpretable deep learning in neuroimaging: a
comprehensive survey [20.373311465258393]
This paper comprehensively reviews interpretable deep learning models in the neuroimaging domain.
We discuss how multiple recent neuroimaging studies leveraged model interpretability to capture anatomical and functional brain alterations most relevant to model predictions.
arXiv Detail & Related papers (2023-07-14T04:50:04Z) - Which Mutual-Information Representation Learning Objectives are
Sufficient for Control? [80.2534918595143]
Mutual information provides an appealing formalism for learning representations of data.
This paper formalizes the sufficiency of a state representation for learning and representing the optimal policy.
Surprisingly, we find that two of these objectives can yield insufficient representations given mild and common assumptions on the structure of the MDP.
arXiv Detail & Related papers (2021-06-14T10:12:34Z) - Generative Interventions for Causal Learning [27.371436971655303]
We introduce a framework for learning robust visual representations that generalize to new viewpoints, backgrounds, and scene contexts.
We show that we can steer generative models to manufacture interventions on features caused by confounding factors.
arXiv Detail & Related papers (2020-12-22T16:01:55Z) - Deep Partial Multi-View Learning [94.39367390062831]
We propose a novel framework termed Cross Partial Multi-View Networks (CPM-Nets)
We fifirst provide a formal defifinition of completeness and versatility for multi-view representation.
We then theoretically prove the versatility of the learned latent representations.
arXiv Detail & Related papers (2020-11-12T02:29:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.