On the Element-Wise Representation and Reasoning in Zero-Shot Image Recognition: A Systematic Survey
- URL: http://arxiv.org/abs/2408.04879v3
- Date: Tue, 26 Nov 2024 10:02:11 GMT
- Title: On the Element-Wise Representation and Reasoning in Zero-Shot Image Recognition: A Systematic Survey
- Authors: Jingcai Guo, Zhijie Rao, Zhi Chen, Song Guo, Jingren Zhou, Dacheng Tao,
- Abstract summary: Zero-shot image recognition (ZSIR) aims to recognize and reason in unseen domains by learning generalized knowledge from limited data.
This paper thoroughly investigates recent advances in element-wise ZSIR and provides a basis for its future development.
- Score: 82.49623756124357
- License:
- Abstract: Zero-shot image recognition (ZSIR) aims to recognize and reason in unseen domains by learning generalized knowledge from limited data in the seen domain. The gist of ZSIR is constructing a well-aligned mapping between the input visual space and the target semantic space, which is a bottom-up paradigm inspired by the process by which humans observe the world. In recent years, ZSIR has witnessed significant progress on a broad spectrum, from theory to algorithm design, as well as widespread applications. However, to the best of our knowledge, there remains a lack of a systematic review of ZSIR from an element-wise perspective, i.e., learning fine-grained elements of data and their inferential associations. To fill the gap, this paper thoroughly investigates recent advances in element-wise ZSIR and provides a sound basis for its future development. Concretely, we first integrate three basic ZSIR tasks, i.e., object recognition, compositional recognition, and foundation model-based open-world recognition, into a unified element-wise paradigm and provide a detailed taxonomy and analysis of the main approaches. Next, we summarize the benchmarks, covering technical implementations, standardized datasets, and some more details as a library. Last, we sketch out related applications, discuss vital challenges, and suggest potential future directions.
Related papers
- SPHERE: Unveiling Spatial Blind Spots in Vision-Language Models Through Hierarchical Evaluation [7.659514491338669]
Current vision-language models may grasp basic spatial cues but struggle with the multi-dimensional spatial reasoning necessary for human-like understanding and real-world applications.
We develop SPHERE, a hierarchical evaluation framework supported by a new human-annotated dataset.
Benchmark evaluation of state-of-the-art models reveals significant deficiencies, especially in reasoning about distance and proximity.
These findings expose critical blind spots in existing models and underscore the need for more advanced spatial reasoning techniques.
arXiv Detail & Related papers (2024-12-17T09:10:55Z) - Open World Object Detection: A Survey [16.839310066730533]
Open world object detection (OWOD) is an emerging area of research that adapts this principle to explore new knowledge.
This paper offers a thorough review of the OWOD domain, covering essential aspects, including problem definitions, benchmark datasets, source codes, evaluation metrics, and a comparative study of existing methods.
The paper concludes by addressing the limitations and challenges faced by current OWOD algorithms and proposes directions for future research.
arXiv Detail & Related papers (2024-10-15T05:46:00Z) - Coding for Intelligence from the Perspective of Category [66.14012258680992]
Coding targets compressing and reconstructing data, and intelligence.
Recent trends demonstrate the potential homogeneity of these two fields.
We propose a novel problem of Coding for Intelligence from the category theory view.
arXiv Detail & Related papers (2024-07-01T07:05:44Z) - Less is More: Toward Zero-Shot Local Scene Graph Generation via
Foundation Models [16.08214739525615]
We present a new task called Local Scene Graph Generation.
It aims to abstract pertinent structural information with partial objects and their relationships in an image.
We introduce zEro-shot Local scEne GrAph geNeraTion (ELEGANT), a framework harnessing foundation models renowned for their powerful perception and commonsense reasoning.
arXiv Detail & Related papers (2023-10-02T17:19:04Z) - Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual
Recognition [57.08108545219043]
Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision.
Existing literature addresses this challenge by employing local-based representation approaches.
This article proposes a novel model, Robust Saliency-aware Distillation (RSaD), for few-shot fine-grained visual recognition.
arXiv Detail & Related papers (2023-05-12T00:13:17Z) - A Threefold Review on Deep Semantic Segmentation: Efficiency-oriented,
Temporal and Depth-aware design [77.34726150561087]
We conduct a survey on the most relevant and recent advances in Deep Semantic in the context of vision for autonomous vehicles.
Our main objective is to provide a comprehensive discussion on the main methods, advantages, limitations, results and challenges faced from each perspective.
arXiv Detail & Related papers (2023-03-08T01:29:55Z) - Knowledge Graph Augmented Network Towards Multiview Representation
Learning for Aspect-based Sentiment Analysis [96.53859361560505]
We propose a knowledge graph augmented network (KGAN) to incorporate external knowledge with explicitly syntactic and contextual information.
KGAN captures the sentiment feature representations from multiple perspectives, i.e., context-, syntax- and knowledge-based.
Experiments on three popular ABSA benchmarks demonstrate the effectiveness and robustness of our KGAN.
arXiv Detail & Related papers (2022-01-13T08:25:53Z) - Place recognition survey: An update on deep learning approaches [0.6352264764099531]
This paper surveys recent approaches and methods used in place recognition, particularly those based on deep learning.
The contributions of this work are twofold: surveying recent sensors such as 3D LiDARs and RADARs, applied in place recognition.
This survey proceeds by elaborating on the various DL-based works, presenting summaries for each framework.
arXiv Detail & Related papers (2021-06-19T09:17:15Z) - Deep Learning for Person Re-identification: A Survey and Outlook [233.36948173686602]
Person re-identification (Re-ID) aims at retrieving a person of interest across multiple non-overlapping cameras.
By dissecting the involved components in developing a person Re-ID system, we categorize it into the closed-world and open-world settings.
arXiv Detail & Related papers (2020-01-13T12:49:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.