Feature Decoupling-Recycling Network for Fast Interactive Segmentation
- URL: http://arxiv.org/abs/2308.03529v2
- Date: Tue, 8 Aug 2023 05:29:57 GMT
- Title: Feature Decoupling-Recycling Network for Fast Interactive Segmentation
- Authors: Huimin Zeng, Weinong Wang, Xin Tao, Zhiwei Xiong, Yu-Wing Tai, Wenjie
Pei
- Abstract summary: Recent interactive segmentation methods iteratively take source image, user guidance and previously predicted mask as the input.
We propose the Feature Decoupling-Recycling Network (FDRN), which decouples the modeling components based on their intrinsic discrepancies.
- Score: 79.22497777645806
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent interactive segmentation methods iteratively take source image, user
guidance and previously predicted mask as the input without considering the
invariant nature of the source image. As a result, extracting features from the
source image is repeated in each interaction, resulting in substantial
computational redundancy. In this work, we propose the Feature
Decoupling-Recycling Network (FDRN), which decouples the modeling components
based on their intrinsic discrepancies and then recycles components for each
user interaction. Thus, the efficiency of the whole interactive process can be
significantly improved. To be specific, we apply the Decoupling-Recycling
strategy from three perspectives to address three types of discrepancies,
respectively. First, our model decouples the learning of source image semantics
from the encoding of user guidance to process two types of input domains
separately. Second, FDRN decouples high-level and low-level features from
stratified semantic representations to enhance feature learning. Third, during
the encoding of user guidance, current user guidance is decoupled from
historical guidance to highlight the effect of current user guidance. We
conduct extensive experiments on 6 datasets from different domains and
modalities, which demonstrate the following merits of our model: 1) superior
efficiency than other methods, particularly advantageous in challenging
scenarios requiring long-term interactions (up to 4.25x faster), while
achieving favorable segmentation performance; 2) strong applicability to
various methods serving as a universal enhancement technique; 3) well
cross-task generalizability, e.g., to medical image segmentation, and
robustness against misleading user guidance.
Related papers
- Reversible Decoupling Network for Single Image Reflection Removal [15.763420129991255]
High-level semantic clues tend to be compressed or discarded during layer-by-layer propagation.
We propose a novel architecture called Reversible Decoupling Network (RDNet)
RDNet employs a reversible encoder to secure valuable information while flexibly decoupling transmission- and reflection-relevant features during the forward pass.
arXiv Detail & Related papers (2024-10-10T15:58:27Z) - EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration [63.112790050749695]
We introduce EAGER, a novel generative recommendation framework that seamlessly integrates both behavioral and semantic information.
We validate the effectiveness of EAGER on four public benchmarks, demonstrating its superior performance compared to existing methods.
arXiv Detail & Related papers (2024-06-20T06:21:56Z) - Generalized Correspondence Matching via Flexible Hierarchical Refinement
and Patch Descriptor Distillation [13.802788788420175]
Correspondence matching plays a crucial role in numerous robotics applications.
This paper addresses the limitations of deep feature matching (DFM), a state-of-the-art (SoTA) plug-and-play correspondence matching approach.
Our proposed method achieves an overall performance in terms of mean matching accuracy of 0.68, 0.92, and 0.95 with respect to the tolerances of 1, 3, and 5 pixels, respectively.
arXiv Detail & Related papers (2024-03-08T15:32:18Z) - Deep Common Feature Mining for Efficient Video Semantic Segmentation [29.054945307605816]
We present Deep Common Feature Mining (DCFM) for video semantic segmentation.
DCFM explicitly decomposes features into two complementary components.
We show that our method has a superior balance between accuracy and efficiency.
arXiv Detail & Related papers (2024-03-05T06:17:59Z) - Improving One-class Recommendation with Multi-tasking on Various
Preference Intensities [1.8416014644193064]
In one-class recommendation, it's required to make recommendations based on users' implicit feedback.
We propose a multi-tasking framework taking various preference intensities of each signal from implicit feedback into consideration.
Our method performs better than state-of-the-art methods by a large margin on three large-scale real-world benchmark datasets.
arXiv Detail & Related papers (2024-01-18T18:59:55Z) - Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module.
Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z) - Disentangled Representation Learning for Text-Video Retrieval [51.861423831566626]
Cross-modality interaction is a critical component in Text-Video Retrieval (TVR)
We study the interaction paradigm in depth, where we find that its computation can be split into two terms.
We propose a disentangled framework to capture a sequential and hierarchical representation.
arXiv Detail & Related papers (2022-03-14T13:55:33Z) - Knowledge-Enhanced Hierarchical Graph Transformer Network for
Multi-Behavior Recommendation [56.12499090935242]
This work proposes a Knowledge-Enhanced Hierarchical Graph Transformer Network (KHGT) to investigate multi-typed interactive patterns between users and items in recommender systems.
KHGT is built upon a graph-structured neural architecture to capture type-specific behavior characteristics.
We show that KHGT consistently outperforms many state-of-the-art recommendation methods across various evaluation settings.
arXiv Detail & Related papers (2021-10-08T09:44:00Z) - Two-Level Adversarial Visual-Semantic Coupling for Generalized Zero-shot
Learning [21.89909688056478]
We propose a new two-level joint idea to augment the generative network with an inference network during training.
This provides strong cross-modal interaction for effective transfer of knowledge between visual and semantic domains.
We evaluate our approach on four benchmark datasets against several state-of-the-art methods, and show its performance.
arXiv Detail & Related papers (2020-07-15T15:34:09Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.