VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and
Challenges
- URL: http://arxiv.org/abs/2212.13296v1
- Date: Mon, 26 Dec 2022 20:56:01 GMT
- Title: VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and
Challenges
- Authors: Rufai Yusuf Zakari, Jim Wilson Owusu, Hailin Wang, Ke Qin, Zaharaddeen
Karami Lawal, Yuezhou Dong
- Abstract summary: The integration of vision and language has sparked a lot of attention as a result of this.
The tasks have been created in such a way that they properly exemplify the concepts of deep learning.
- Score: 1.565870461096057
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Artificial Intelligence (AI) and its applications have sparked extraordinary
interest in recent years. This achievement can be ascribed in part to advances
in AI subfields including Machine Learning (ML), Computer Vision (CV), and
Natural Language Processing (NLP). Deep learning, a sub-field of machine
learning that employs artificial neural network concepts, has enabled the most
rapid growth in these domains. The integration of vision and language has
sparked a lot of attention as a result of this. The tasks have been created in
such a way that they properly exemplify the concepts of deep learning. In this
review paper, we provide a thorough and an extensive review of the state of the
arts approaches, key models design principles and discuss existing datasets,
methods, their problem formulation and evaluation measures for VQA and Visual
reasoning tasks to understand vision and language representation learning. We
also present some potential future paths in this field of research, with the
hope that our study may generate new ideas and novel approaches to handle
existing difficulties and develop new applications.
Related papers
- Recent Advances in Generative AI and Large Language Models: Current Status, Challenges, and Perspectives [10.16399860867284]
The emergence of Generative Artificial Intelligence (AI) and Large Language Models (LLMs) has marked a new era of Natural Language Processing (NLP)
This paper explores the current state of these cutting-edge technologies, demonstrating their remarkable advancements and wide-ranging applications.
arXiv Detail & Related papers (2024-07-20T18:48:35Z) - Trends, Applications, and Challenges in Human Attention Modelling [65.61554471033844]
Human attention modelling has proven to be particularly useful for understanding the cognitive processes underlying visual exploration.
It provides support to artificial intelligence models that aim to solve problems in various domains, including image and video processing, vision-and-language applications, and language modelling.
arXiv Detail & Related papers (2024-02-28T19:35:30Z) - Opening the Black-Box: A Systematic Review on Explainable AI in Remote Sensing [51.524108608250074]
Black-box machine learning approaches have become a dominant modeling paradigm for knowledge extraction in remote sensing.
We perform a systematic review to identify the key trends in the field and shed light on novel explainable AI approaches.
We also give a detailed outlook on the challenges and promising research directions.
arXiv Detail & Related papers (2024-02-21T13:19:58Z) - Machine Unlearning: A Survey [56.79152190680552]
A special need has arisen where, due to privacy, usability, and/or the right to be forgotten, information about some specific samples needs to be removed from a model, called machine unlearning.
This emerging technology has drawn significant interest from both academics and industry due to its innovation and practicality.
No study has analyzed this complex topic or compared the feasibility of existing unlearning solutions in different kinds of scenarios.
The survey concludes by highlighting some of the outstanding issues with unlearning techniques, along with some feasible directions for new research opportunities.
arXiv Detail & Related papers (2023-06-06T10:18:36Z) - Towards Data-and Knowledge-Driven Artificial Intelligence: A Survey on Neuro-Symbolic Computing [73.0977635031713]
Neural-symbolic computing (NeSy) has been an active research area of Artificial Intelligence (AI) for many years.
NeSy shows promise of reconciling the advantages of reasoning and interpretability of symbolic representation and robust learning in neural networks.
arXiv Detail & Related papers (2022-10-28T04:38:10Z) - Deep Learning to See: Towards New Foundations of Computer Vision [88.69805848302266]
This book criticizes the supposed scientific progress in the field of computer vision.
It proposes the investigation of vision within the framework of information-based laws of nature.
arXiv Detail & Related papers (2022-06-30T15:20:36Z) - Visual Knowledge Discovery with Artificial Intelligence: Challenges and
Future Directions [5.754786889790288]
Integrated Visual Knowledge Discovery combines advances in Artificial Intelligence/Machine Learning (AI/ML) and visualization.
Chapters included are extended versions of the selected AI and Visual Analytics papers and related symposiums.
We aim to present challenges and future directions within the field of Visual Analytics, Visual Knowledge Discovery and AI/ML, and to discuss the role of visualization in visual AI/ML.
arXiv Detail & Related papers (2022-05-03T04:17:21Z) - Vision-Language Intelligence: Tasks, Representation Learning, and Large
Models [32.142076223602906]
This paper presents a comprehensive survey of vision-language intelligence from the perspective of time.
We summarize the development in this field into three time periods, namely task-specific methods, vision-language pre-training methods, and larger models empowered by large-scale weakly-labeled data.
arXiv Detail & Related papers (2022-03-03T18:54:59Z) - Threat of Adversarial Attacks on Deep Learning in Computer Vision:
Survey II [86.51135909513047]
Deep Learning is vulnerable to adversarial attacks that can manipulate its predictions.
This article reviews the contributions made by the computer vision community in adversarial attacks on deep learning.
It provides definitions of technical terminologies for non-experts in this domain.
arXiv Detail & Related papers (2021-08-01T08:54:47Z) - Core Challenges in Embodied Vision-Language Planning [9.190245973578698]
We discuss Embodied Vision-Language Planning tasks, a family of prominent embodied navigation and manipulation problems.
We propose a taxonomy to unify these tasks and provide an analysis and comparison of the new and current algorithmic approaches.
We advocate for task construction that enables model generalizability and furthers real-world deployment.
arXiv Detail & Related papers (2021-06-26T05:18:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.