Seeing the Intangible: Survey of Image Classification into High-Level
and Abstract Categories
- URL: http://arxiv.org/abs/2308.10562v2
- Date: Thu, 29 Feb 2024 16:18:45 GMT
- Title: Seeing the Intangible: Survey of Image Classification into High-Level
and Abstract Categories
- Authors: Delfina Sol Martinez Pandiani and Valentina Presutti
- Abstract summary: The field of Computer Vision (CV) is increasingly shifting towards high-level'' visual sensemaking tasks.
This paper systematically reviews research on high-level visual understanding, focusing on Abstract Concepts (ACs) in automatic image classification.
- Score: 0.20718016474717196
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The field of Computer Vision (CV) is increasingly shifting towards
``high-level'' visual sensemaking tasks, yet the exact nature of these tasks
remains unclear and tacit. This survey paper addresses this ambiguity by
systematically reviewing research on high-level visual understanding, focusing
particularly on Abstract Concepts (ACs) in automatic image classification. Our
survey contributes in three main ways: Firstly, it clarifies the tacit
understanding of high-level semantics in CV through a multidisciplinary
analysis, and categorization into distinct clusters, including commonsense,
emotional, aesthetic, and inductive interpretative semantics. Secondly, it
identifies and categorizes computer vision tasks associated with high-level
visual sensemaking, offering insights into the diverse research areas within
this domain. Lastly, it examines how abstract concepts such as values and
ideologies are handled in CV, revealing challenges and opportunities in
AC-based image classification. Notably, our survey of AC image classification
tasks highlights persistent challenges, such as the limited efficacy of massive
datasets and the importance of integrating supplementary information and
mid-level features. We emphasize the growing relevance of hybrid AI systems in
addressing the multifaceted nature of AC image classification tasks. Overall,
this survey enhances our understanding of high-level visual reasoning in CV and
lays the groundwork for future research endeavors.
Related papers
- Stitching Gaps: Fusing Situated Perceptual Knowledge with Vision
Transformers for High-Level Image Classification [0.1843404256219181]
We leverage situated perceptual knowledge of cultural images to enhance performance and interpretability in AC image classification.
This resource captures situated perceptual semantics gleaned from over 14,000 cultural images labeled with ACs.
We demonstrate the synergy and complementarity between KGE embeddings' situated perceptual knowledge and deep visual model's sensory-perceptual understanding for AC image classification.
arXiv Detail & Related papers (2024-02-29T16:46:48Z) - Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models [64.24227572048075]
We propose a Knowledge-Aware Prompt Tuning (KAPT) framework for vision-language models.
Our approach takes inspiration from human intelligence in which external knowledge is usually incorporated into recognizing novel categories of objects.
arXiv Detail & Related papers (2023-08-22T04:24:45Z) - EnTri: Ensemble Learning with Tri-level Representations for Explainable Scene Recognition [27.199124692225777]
Scene recognition based on deep-learning has made significant progress, but there are still limitations in its performance.
We propose EnTri, a framework that employs ensemble learning using a hierarchy of visual features.
EnTri has demonstrated superiority in terms of recognition accuracy, achieving competitive performance compared to state-of-the-art approaches.
arXiv Detail & Related papers (2023-07-23T22:11:23Z) - Top-Down Visual Attention from Analysis by Synthesis [87.47527557366593]
We consider top-down attention from a classic Analysis-by-Synthesis (AbS) perspective of vision.
We propose Analysis-by-Synthesis Vision Transformer (AbSViT), which is a top-down modulated ViT model that variationally approximates AbS, and controllable achieves top-down attention.
arXiv Detail & Related papers (2023-03-23T05:17:05Z) - ExpNet: A unified network for Expert-Level Classification [40.109357254623085]
We propose Expert Network (ExpNet) to address the unique challenges of expert-level classification through a unified network.
In ExpNet, we hierarchically decouple the part and context features and individually process them using a novel attentive mechanism, called Gaze-Shift.
We conduct the experiments over three representative expert-level classification tasks: FGVC, disease classification, and artwork attributes classification.
arXiv Detail & Related papers (2022-11-29T12:20:25Z) - A Survey on Evolutionary Computation for Computer Vision and Image
Analysis: Past, Present, and Future Trends [6.48586558584924]
It aims to provide a better understanding of evolutionary computer vision (ECV) by discussing the contributions of different approaches.
The applications, challenges, issues, and trends associated to this research field are also discussed and summarised.
arXiv Detail & Related papers (2022-09-14T03:35:25Z) - Exploring CLIP for Assessing the Look and Feel of Images [87.97623543523858]
We introduce Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner.
Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments.
arXiv Detail & Related papers (2022-07-25T17:58:16Z) - Region-level Active Learning for Cluttered Scenes [60.93811392293329]
We introduce a new strategy that subsumes previous Image-level and Object-level approaches into a generalized, Region-level approach.
We show that this approach significantly decreases labeling effort and improves rare object search on realistic data with inherent class-imbalance and cluttered scenes.
arXiv Detail & Related papers (2021-08-20T14:02:38Z) - Task-Independent Knowledge Makes for Transferable Representations for
Generalized Zero-Shot Learning [77.0715029826957]
Generalized Zero-Shot Learning (GZSL) targets recognizing new categories by learning transferable image representations.
We propose a novel Dual-Contrastive Embedding Network (DCEN) that simultaneously learns task-specific and task-independent knowledge.
arXiv Detail & Related papers (2021-04-05T10:05:48Z) - Deep Learning for Scene Classification: A Survey [48.57123373347695]
Scene classification is a longstanding, fundamental and challenging problem in computer vision.
The rise of large-scale datasets and the renaissance of deep learning techniques have brought remarkable progress in the field of scene representation and classification.
This paper provides a comprehensive survey of recent achievements in scene classification using deep learning.
arXiv Detail & Related papers (2021-01-26T03:06:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.