ComPtr: Towards Diverse Bi-source Dense Prediction Tasks via A Simple
yet General Complementary Transformer
- URL: http://arxiv.org/abs/2307.12349v1
- Date: Sun, 23 Jul 2023 15:17:45 GMT
- Title: ComPtr: Towards Diverse Bi-source Dense Prediction Tasks via A Simple
yet General Complementary Transformer
- Authors: Youwei Pang, Xiaoqi Zhao, Lihe Zhang, Huchuan Lu
- Abstract summary: We propose a novel underlineComPlementary underlinetransformer, textbfComPtr, for diverse bi-source dense prediction tasks.
ComPtr treats different inputs equally and builds an efficient dense interaction model in the form of sequence-to-sequence on top of the transformer.
- Score: 91.43066633305662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning (DL) has advanced the field of dense prediction, while
gradually dissolving the inherent barriers between different tasks. However,
most existing works focus on designing architectures and constructing visual
cues only for the specific task, which ignores the potential uniformity
introduced by the DL paradigm. In this paper, we attempt to construct a novel
\underline{ComP}lementary \underline{tr}ansformer, \textbf{ComPtr}, for diverse
bi-source dense prediction tasks. Specifically, unlike existing methods that
over-specialize in a single task or a subset of tasks, ComPtr starts from the
more general concept of bi-source dense prediction. Based on the basic
dependence on information complementarity, we propose consistency enhancement
and difference awareness components with which ComPtr can evacuate and collect
important visual semantic cues from different image sources for diverse tasks,
respectively. ComPtr treats different inputs equally and builds an efficient
dense interaction model in the form of sequence-to-sequence on top of the
transformer. This task-generic design provides a smooth foundation for
constructing the unified model that can simultaneously deal with various
bi-source information. In extensive experiments across several representative
vision tasks, i.e. remote sensing change detection, RGB-T crowd counting,
RGB-D/T salient object detection, and RGB-D semantic segmentation, the proposed
method consistently obtains favorable performance. The code will be available
at \url{https://github.com/lartpang/ComPtr}.
Related papers
- A Multitask Deep Learning Model for Classification and Regression of Hyperspectral Images: Application to the large-scale dataset [44.94304541427113]
We propose a multitask deep learning model to perform multiple classification and regression tasks simultaneously on hyperspectral images.
We validated our approach on a large hyperspectral dataset called TAIGA.
A comprehensive qualitative and quantitative analysis of the results shows that the proposed method significantly outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-23T11:14:54Z) - Comprehensive Generative Replay for Task-Incremental Segmentation with Concurrent Appearance and Semantic Forgetting [49.87694319431288]
Generalist segmentation models are increasingly favored for diverse tasks involving various objects from different image sources.
We propose a Comprehensive Generative (CGR) framework that restores appearance and semantic knowledge by synthesizing image-mask pairs.
Experiments on incremental tasks (cardiac, fundus and prostate segmentation) show its clear advantage for alleviating concurrent appearance and semantic forgetting.
arXiv Detail & Related papers (2024-06-28T10:05:58Z) - ULTRA-DP: Unifying Graph Pre-training with Multi-task Graph Dual Prompt [67.8934749027315]
We propose a unified framework for graph hybrid pre-training which injects the task identification and position identification into GNNs.
We also propose a novel pre-training paradigm based on a group of $k$-nearest neighbors.
arXiv Detail & Related papers (2023-10-23T12:11:13Z) - RGM: A Robust Generalizable Matching Model [49.60975442871967]
We propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching)
To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth.
We are able to mix up various dense and sparse matching datasets, significantly improving the training diversity.
arXiv Detail & Related papers (2023-10-18T07:30:08Z) - A Dynamic Feature Interaction Framework for Multi-task Visual Perception [100.98434079696268]
We devise an efficient unified framework to solve multiple common perception tasks.
These tasks include instance segmentation, semantic segmentation, monocular 3D detection, and depth estimation.
Our proposed framework, termed D2BNet, demonstrates a unique approach to parameter-efficient predictions for multi-task perception.
arXiv Detail & Related papers (2023-06-08T09:24:46Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.