Related papers: A Refreshed Similarity-based Upsampler for Direct High-Ratio Feature Upsampling

A Refreshed Similarity-based Upsampler for Direct High-Ratio Feature Upsampling

URL: http://arxiv.org/abs/2407.02283v1
Date: Tue, 2 Jul 2024 14:12:21 GMT
Title: A Refreshed Similarity-based Upsampler for Direct High-Ratio Feature Upsampling
Authors: Minghao Zhou, Hong Wang, Yefeng Zheng, Deyu Meng,
Abstract summary: We propose an explicitly controllable query-key feature alignment from both semantic-aware and detail-aware perspectives. We also develop a fine-grained neighbor selection strategy on HR features, which is simple yet effective for alleviating mosaic artifacts. Our proposed ReSFU framework consistently achieves satisfactory performance on different segmentation applications.
Score: 54.05517338122698
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Feature upsampling is a fundamental and indispensable ingredient of almost all current network structures for image segmentation tasks. Recently, a popular similarity-based feature upsampling pipeline has been proposed, which utilizes a high-resolution feature as guidance to help upsample the low-resolution deep feature based on their local similarity. Albeit achieving promising performance, this pipeline has specific limitations: 1) HR query and LR key features are not well aligned; 2) the similarity between query-key features is computed based on the fixed inner product form; 3) neighbor selection is coarsely operated on LR features, resulting in mosaic artifacts. These shortcomings make the existing methods along this pipeline primarily applicable to hierarchical network architectures with iterative features as guidance and they are not readily extended to a broader range of structures, especially for a direct high-ratio upsampling. Against the issues, we meticulously optimize every methodological design. Specifically, we firstly propose an explicitly controllable query-key feature alignment from both semantic-aware and detail-aware perspectives, and then construct a parameterized paired central difference convolution block for flexibly calculating the similarity between the well-aligned query-key features. Besides, we develop a fine-grained neighbor selection strategy on HR features, which is simple yet effective for alleviating mosaic artifacts. Based on these careful designs, we systematically construct a refreshed similarity-based feature upsampling framework named ReSFU. Extensive experiments substantiate that our proposed ReSFU is finely applicable to various types of architectures in a direct high-ratio upsampling manner, and consistently achieves satisfactory performance on different segmentation applications, showing superior generality and ease of deployment.

Related papers

Comprehend, Divide, and Conquer: Feature Subspace Exploration via Multi-Agent Hierarchical Reinforcement Learning [10.317489871533565]
In this paper, we introduce HRLFS, a reinforcement learning-based subspace exploration strategy for complex datasets. We show that HRLFS improves the downstream machine learning performance with iterative feature subspace exploration. We also show that HRLFS accelerates total run time by reducing the number of agents involved.
arXiv Detail & Related papers (2025-04-24T08:16:36Z)
LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models [27.379438040350188]
Feature upsampling offers a promising direction to address this challenge. We introduce a coordinate-based cross-attention transformer that integrates the high-resolution images with coordinates and low-resolution VFM features. Our approach effectively captures fine-grained details and adapts flexibly to various input and feature resolutions.
arXiv Detail & Related papers (2025-04-18T18:46:08Z)
ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts. Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z)
WTDUN: Wavelet Tree-Structured Sampling and Deep Unfolding Network for Image Compressed Sensing [51.94493817128006]
We propose a novel wavelet-domain deep unfolding framework named WTDUN, which operates directly on the multi-scale wavelet subbands. Our method utilizes the intrinsic sparsity and multi-scale structure of wavelet coefficients to achieve a tree-structured sampling and reconstruction.
arXiv Detail & Related papers (2024-11-25T12:31:03Z)
Unifying Feature and Cost Aggregation with Transformers for Semantic and Visual Correspondence [51.54175067684008]
This paper introduces a Transformer-based integrative feature and cost aggregation network designed for dense matching tasks. We first show that feature aggregation and cost aggregation exhibit distinct characteristics and reveal the potential for substantial benefits stemming from the judicious use of both aggregation processes. Our framework is evaluated on standard benchmarks for semantic matching, and also applied to geometric matching, where we show that our approach achieves significant improvements compared to existing methods.
arXiv Detail & Related papers (2024-03-17T07:02:55Z)
Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task. A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks. Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z)
BIMS-PU: Bi-Directional and Multi-Scale Point Cloud Upsampling [60.257912103351394]
We develop a new point cloud upsampling pipeline called BIMS-PU. We decompose the up/downsampling procedure into several up/downsampling sub-steps by breaking the target sampling factor into smaller factors. We show that our method achieves superior results to state-of-the-art approaches.
arXiv Detail & Related papers (2022-06-25T13:13:37Z)
Reuse your features: unifying retrieval and feature-metric alignment [3.845387441054033]
DRAN is the first network able to produce the features for the three steps of visual localization. It achieves competitive performance in terms of robustness and accuracy under challenging conditions in public benchmarks.
arXiv Detail & Related papers (2022-04-13T10:42:00Z)
CFNet: Learning Correlation Functions for One-Stage Panoptic Segmentation [46.252118473248316]
We propose to first predict semantic-level and instance-level correlations among different locations that are utilized to enhance the backbone features. We then feed the improved discriminative features into the corresponding segmentation heads, respectively. We achieve state-of-the-art performance on MS with $45.1$% PQ and ADE20k with $32.6$% PQ.
arXiv Detail & Related papers (2022-01-13T05:31:14Z)
Local Similarity Pattern and Cost Self-Reassembling for Deep Stereo Matching Networks [3.7384509727711923]
We introduce a pairwise feature for deep stereo matching networks, named LSP (Local Similarity Pattern) Through explicitly revealing the neighbor relationships, LSP contains rich structural information, which can be leveraged to aid for more discriminative feature description. Secondly, we design a dynamic self-reassembling refinement strategy and apply it to the cost distribution and the disparity map respectively.
arXiv Detail & Related papers (2021-12-02T06:52:54Z)
Learning to Aggregate Multi-Scale Context for Instance Segmentation in Remote Sensing Images [28.560068780733342]
A novel context aggregation network (CATNet) is proposed to improve the feature extraction process. The proposed model exploits three lightweight plug-and-play modules, namely dense feature pyramid network (DenseFPN), spatial context pyramid ( SCP), and hierarchical region of interest extractor (HRoIE)
arXiv Detail & Related papers (2021-11-22T08:55:25Z)
M2IOSR: Maximal Mutual Information Open Set Recognition [47.1393314282815]
We propose a mutual information-based method with a streamlined architecture for open set recognition. The proposed method significantly improves the performance of baselines and achieves new state-of-the-art results on several benchmarks consistently.
arXiv Detail & Related papers (2021-08-05T05:08:12Z)
Fine-Grained Dynamic Head for Object Detection [68.70628757217939]
We propose a fine-grained dynamic head to conditionally select a pixel-level combination of FPN features from different scales for each instance. Experiments demonstrate the effectiveness and efficiency of the proposed method on several state-of-the-art detection benchmarks.
arXiv Detail & Related papers (2020-12-07T08:16:32Z)
Multi-Person Pose Estimation with Enhanced Feature Aggregation and Selection [33.15192824888279]
We propose a novel Enhanced Feature Aggregation and Selection network (EFASNet) for multi-person 2D human pose estimation. Our method can well handle crowded, cluttered and occluded scenes. Comprehensive experiments demonstrate that the proposed approach outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2020-03-20T08:33:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.