Deep Learning in Concealed Dense Prediction
- URL: http://arxiv.org/abs/2504.10979v1
- Date: Tue, 15 Apr 2025 08:44:42 GMT
- Title: Deep Learning in Concealed Dense Prediction
- Authors: Pancheng Zhao, Deng-Ping Fan, Shupeng Cheng, Salman Khan, Fahad Shahbaz Khan, David Clifton, Peng Xu, Jufeng Yang,
- Abstract summary: We introduce and review a family of complex tasks, termed Concealed Dense Prediction (CDP), which has great value in agriculture, industry, etc.<n>CDP's intrinsic trait is that the targets are concealed in their surroundings, thus fully perceiving them requires fine-grained representations, prior knowledge, auxiliary reasoning, etc.
- Score: 83.89736735583935
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning is developing rapidly and handling common computer vision tasks well. It is time to pay attention to more complex vision tasks, as model size, knowledge, and reasoning capabilities continue to improve. In this paper, we introduce and review a family of complex tasks, termed Concealed Dense Prediction (CDP), which has great value in agriculture, industry, etc. CDP's intrinsic trait is that the targets are concealed in their surroundings, thus fully perceiving them requires fine-grained representations, prior knowledge, auxiliary reasoning, etc. The contributions of this review are three-fold: (i) We introduce the scope, characteristics, and challenges specific to CDP tasks and emphasize their essential differences from generic vision tasks. (ii) We develop a taxonomy based on concealment counteracting to summarize deep learning efforts in CDP through experiments on three tasks. We compare 25 state-of-the-art methods across 12 widely used concealed datasets. (iii) We discuss the potential applications of CDP in the large model era and summarize 6 potential research directions. We offer perspectives for the future development of CDP by constructing a large-scale multimodal instruction fine-tuning dataset, CvpINST, and a concealed visual perception agent, CvpAgent.
Related papers
- Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation [53.84282335629258]
We introduce a comprehensive fine-grained evaluation benchmark, i.e., FG-BMK, comprising 3.49 million questions and 3.32 million images.
Our evaluation systematically examines LVLMs from both human-oriented and machine-oriented perspectives.
We uncover key findings regarding the influence of training paradigms, modality alignment, perturbation susceptibility, and fine-grained category reasoning on task performance.
arXiv Detail & Related papers (2025-04-21T09:30:41Z) - Can LLMs Assist Computer Education? an Empirical Case Study of DeepSeek [38.30073108450149]
This study employs both simulation questions and real-world inquiries concerning computer network security posed by Chinese network engineers.<n>The findings demonstrate that the model performs consistently, regardless of whether prompts include a role definition or not.<n>Although DeepSeek-V3 offers considerable practical value for network security education, challenges remain in its capability to process multimodal data.
arXiv Detail & Related papers (2025-04-01T04:58:16Z) - End-to-end Graph Learning Approach for Cognitive Diagnosis of Student Tutorial [11.670969577565774]
This paper proposes an End-to-end Graph Neural Networks-based Cognitive Diagnosis (EGNN-CD) model.
EGNN-CD consists of three main parts: knowledge concept network (KCN), graph neural networks-based feature extraction (GNNFE), and cognitive ability prediction (CAP)
arXiv Detail & Related papers (2024-10-30T06:18:47Z) - Deep Learning for Video Anomaly Detection: A Review [52.74513211976795]
Video anomaly detection (VAD) aims to discover behaviors or events deviating from the normality in videos.
In the era of deep learning, a great variety of deep learning based methods are constantly emerging for the VAD task.
This review covers the spectrum of five different categories, namely, semi-supervised, weakly supervised, fully supervised, unsupervised and open-set supervised VAD.
arXiv Detail & Related papers (2024-09-09T07:31:16Z) - Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models [24.579822095003685]
We conduct an empirical study on representation learning for downstream Visual Question Answering (VQA)<n>We thoroughly investigate the benefits and trade-offs of OC models and alternative approaches.<n>We identify a promising path to leverage the strengths of both paradigms.
arXiv Detail & Related papers (2024-07-22T12:26:08Z) - Effectiveness Assessment of Recent Large Vision-Language Models [78.69439393646554]
This paper endeavors to evaluate the competency of popular large vision-language models (LVLMs) in specialized and general tasks.
We employ six challenging tasks in three different application scenarios: natural, healthcare, and industrial.
We examine the performance of three recent open-source LVLMs, including MiniGPT-v2, LLaVA-1.5, and Shikra, on both visual recognition and localization in these tasks.
arXiv Detail & Related papers (2024-03-07T08:25:27Z) - WHU-Synthetic: A Synthetic Perception Dataset for 3-D Multitask Model Research [9.945833036861892]
WHU-Synthetic is a large-scale 3D synthetic perception dataset designed for multi-task learning.<n>We implement several novel settings, making it possible to realize certain ideas that are difficult to achieve in real-world scenarios.
arXiv Detail & Related papers (2024-02-29T11:38:44Z) - A Threefold Review on Deep Semantic Segmentation: Efficiency-oriented,
Temporal and Depth-aware design [77.34726150561087]
We conduct a survey on the most relevant and recent advances in Deep Semantic in the context of vision for autonomous vehicles.
Our main objective is to provide a comprehensive discussion on the main methods, advantages, limitations, results and challenges faced from each perspective.
arXiv Detail & Related papers (2023-03-08T01:29:55Z) - Task Formulation Matters When Learning Continually: A Case Study in
Visual Question Answering [58.82325933356066]
Continual learning aims to train a model incrementally on a sequence of tasks without forgetting previous knowledge.
We present a detailed study of how different settings affect performance for Visual Question Answering.
arXiv Detail & Related papers (2022-09-30T19:12:58Z) - PointContrast: Unsupervised Pre-training for 3D Point Cloud
Understanding [107.02479689909164]
In this work, we aim at facilitating research on 3D representation learning.
We measure the effect of unsupervised pre-training on a large source set of 3D scenes.
arXiv Detail & Related papers (2020-07-21T17:59:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.