Related papers: On the Element-Wise Representation and Reasoning in Zero-Shot Image Recognition: A Systematic Survey

On the Element-Wise Representation and Reasoning in Zero-Shot Image Recognition: A Systematic Survey

URL: http://arxiv.org/abs/2408.04879v2
Date: Thu, 22 Aug 2024 09:04:29 GMT
Title: On the Element-Wise Representation and Reasoning in Zero-Shot Image Recognition: A Systematic Survey
Authors: Jingcai Guo, Zhijie Rao, Zhi Chen, Song Guo, Jingren Zhou, Dacheng Tao,
Abstract summary: Zero-shot image recognition (ZSIR) aims at empowering models to recognize and reason in unseen domains. This paper presents a broad review of recent advances in element-wise ZSIR. We first attempt to integrate the three basic ZSIR tasks of object recognition, compositional recognition, and foundation model-based open-world recognition into a unified element-wise perspective.
Score: 82.49623756124357
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Zero-shot image recognition (ZSIR) aims at empowering models to recognize and reason in unseen domains via learning generalized knowledge from limited data in the seen domain. The gist for ZSIR is to execute element-wise representation and reasoning from the input visual space to the target semantic space, which is a bottom-up modeling paradigm inspired by the process by which humans observe the world, i.e., capturing new concepts by learning and combining the basic components or shared characteristics. In recent years, element-wise learning techniques have seen significant progress in ZSIR as well as widespread application. However, to the best of our knowledge, there remains a lack of a systematic overview of this topic. To enrich the literature and provide a sound basis for its future development, this paper presents a broad review of recent advances in element-wise ZSIR. Concretely, we first attempt to integrate the three basic ZSIR tasks of object recognition, compositional recognition, and foundation model-based open-world recognition into a unified element-wise perspective and provide a detailed taxonomy and analysis of the main research approaches. Then, we collect and summarize some key information and benchmarks, such as detailed technical implementations and common datasets. Finally, we sketch out the wide range of its related applications, discuss vital challenges, and suggest potential future directions.

Related papers

Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models [10.1080193179562]
Current understanding models excel at recognizing "what" but fall short in high-level cognitive tasks like causal reasoning and future prediction.<n>We propose a novel framework that fuses a powerful Vision Foundation Model for deep visual perception with a Large Language Model (LLM) serving as a knowledge-driven reasoning core.
arXiv Detail & Related papers (2025-07-08T09:43:17Z)
3D Skeleton-Based Action Recognition: A Review [60.0580120274659]
3D skeleton-based action recognition has become a prominent topic in the field of computer vision.<n>Previous reviews have predominantly adopted a model-oriented perspective, often neglecting the fundamental steps involved in skeleton-based action recognition.<n>This review aims to address these limitations by presenting a comprehensive, task-oriented framework for understanding skeleton-based action recognition.
arXiv Detail & Related papers (2025-06-01T09:04:12Z)
Place Recognition Meet Multiple Modalitie: A Comprehensive Review, Current Challenges and Future Directions [2.4775350526606355]
We review recent advancements in place recognition, emphasizing three methodological paradigms.<n>CNN-based approaches, Transformer-based frameworks, and cross-modal strategies are discussed.<n>We identify current research challenges and outline prospective directions, including domain adaptation, real-time performance, and lifelong learning, to inspire future advancements in this domain.
arXiv Detail & Related papers (2025-05-20T08:16:37Z)
MIRAGE: A Multi-modal Benchmark for Spatial Perception, Reasoning, and Intelligence [14.694404760882986]
MIRAGE is a benchmark designed to evaluate models' capabilities in Counting (object attribute recognition), Relation (spatial relational reasoning), and Counting with Relation.<n>By targeting these foundational abilities, MIRAGE provides a pathway toward spatial recognition towardtemporal reasoning in future research.
arXiv Detail & Related papers (2025-05-15T16:08:14Z)
Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook [85.43403500874889]
Retrieval-augmented generation (RAG) has emerged as a pivotal technique in artificial intelligence (AI) Recent advancements in RAG for embodied AI, with a particular focus on applications in planning, task execution, multimodal perception, interaction, and specialized domains.
arXiv Detail & Related papers (2025-03-23T10:33:28Z)
Open World Object Detection: A Survey [16.839310066730533]
Open world object detection (OWOD) is an emerging area of research that adapts this principle to explore new knowledge. This paper offers a thorough review of the OWOD domain, covering essential aspects, including problem definitions, benchmark datasets, source codes, evaluation metrics, and a comparative study of existing methods. The paper concludes by addressing the limitations and challenges faced by current OWOD algorithms and proposes directions for future research.
arXiv Detail & Related papers (2024-10-15T05:46:00Z)
Discovering Conceptual Knowledge with Analytic Ontology Templates for Articulated Objects [42.9186628100765]
We aim to endow machine intelligence with an analogous capability through performing at the conceptual level. AOT-driven approach yields benefits in three key perspectives.
arXiv Detail & Related papers (2024-09-18T04:53:38Z)
Coding for Intelligence from the Perspective of Category [66.14012258680992]
Coding targets compressing and reconstructing data, and intelligence. Recent trends demonstrate the potential homogeneity of these two fields. We propose a novel problem of Coding for Intelligence from the category theory view.
arXiv Detail & Related papers (2024-07-01T07:05:44Z)
On the Role of Entity and Event Level Conceptualization in Generalizable Reasoning: A Survey of Tasks, Methods, Applications, and Future Directions [46.63556358247516]
Entity- and event-level conceptualization plays a pivotal role in generalizable reasoning. There is currently a lack of a systematic overview that comprehensively examines existing works in the definition, execution, and application of conceptualization. We present the first comprehensive survey of 150+ papers, categorizing various definitions, resources, methods, and downstream applications related to conceptualization into a unified taxonomy.
arXiv Detail & Related papers (2024-06-16T10:32:41Z)
Augmented Commonsense Knowledge for Remote Object Grounding [67.30864498454805]
We propose an augmented commonsense knowledge model (ACK) to leverage commonsense information as atemporal knowledge graph for improving agent navigation. ACK consists of knowledge graph-aware cross-modal and concept aggregation modules to enhance visual representation and visual-textual data alignment. We add a new pipeline for the commonsense-based decision-making process which leads to more accurate local action prediction.
arXiv Detail & Related papers (2024-06-03T12:12:33Z)
Less is More: Toward Zero-Shot Local Scene Graph Generation via Foundation Models [16.08214739525615]
We present a new task called Local Scene Graph Generation. It aims to abstract pertinent structural information with partial objects and their relationships in an image. We introduce zEro-shot Local scEne GrAph geNeraTion (ELEGANT), a framework harnessing foundation models renowned for their powerful perception and commonsense reasoning.
arXiv Detail & Related papers (2023-10-02T17:19:04Z)
Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual Recognition [57.08108545219043]
Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision. Existing literature addresses this challenge by employing local-based representation approaches. This article proposes a novel model, Robust Saliency-aware Distillation (RSaD), for few-shot fine-grained visual recognition.
arXiv Detail & Related papers (2023-05-12T00:13:17Z)
A Threefold Review on Deep Semantic Segmentation: Efficiency-oriented, Temporal and Depth-aware design [77.34726150561087]
We conduct a survey on the most relevant and recent advances in Deep Semantic in the context of vision for autonomous vehicles. Our main objective is to provide a comprehensive discussion on the main methods, advantages, limitations, results and challenges faced from each perspective.
arXiv Detail & Related papers (2023-03-08T01:29:55Z)
Towards Data-and Knowledge-Driven Artificial Intelligence: A Survey on Neuro-Symbolic Computing [73.0977635031713]
Neural-symbolic computing (NeSy) has been an active research area of Artificial Intelligence (AI) for many years. NeSy shows promise of reconciling the advantages of reasoning and interpretability of symbolic representation and robust learning in neural networks.
arXiv Detail & Related papers (2022-10-28T04:38:10Z)
Knowledge Graph Augmented Network Towards Multiview Representation Learning for Aspect-based Sentiment Analysis [96.53859361560505]
We propose a knowledge graph augmented network (KGAN) to incorporate external knowledge with explicitly syntactic and contextual information. KGAN captures the sentiment feature representations from multiple perspectives, i.e., context-, syntax- and knowledge-based. Experiments on three popular ABSA benchmarks demonstrate the effectiveness and robustness of our KGAN.
arXiv Detail & Related papers (2022-01-13T08:25:53Z)
Place recognition survey: An update on deep learning approaches [0.6352264764099531]
This paper surveys recent approaches and methods used in place recognition, particularly those based on deep learning. The contributions of this work are twofold: surveying recent sensors such as 3D LiDARs and RADARs, applied in place recognition. This survey proceeds by elaborating on the various DL-based works, presenting summaries for each framework.
arXiv Detail & Related papers (2021-06-19T09:17:15Z)
Deep Gait Recognition: A Survey [15.47582611826366]
Gait recognition is an appealing biometric modality which aims to identify individuals based on the way they walk. Deep learning has reshaped the research landscape in this area since 2015 through the ability to automatically learn discriminative representations. We present a comprehensive overview of breakthroughs and recent developments in gait recognition with deep learning.
arXiv Detail & Related papers (2021-02-18T18:49:28Z)
Deep Learning for Person Re-identification: A Survey and Outlook [233.36948173686602]
Person re-identification (Re-ID) aims at retrieving a person of interest across multiple non-overlapping cameras. By dissecting the involved components in developing a person Re-ID system, we categorize it into the closed-world and open-world settings.
arXiv Detail & Related papers (2020-01-13T12:49:22Z)
A Review on Intelligent Object Perception Methods Combining Knowledge-based Reasoning and Machine Learning [60.335974351919816]
Object perception is a fundamental sub-field of Computer Vision. Recent works seek ways to integrate knowledge engineering in order to expand the level of intelligence of the visual interpretation of objects.
arXiv Detail & Related papers (2019-12-26T13:26:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.