Related papers: UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders

UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders

URL: http://arxiv.org/abs/2601.17950v1
Date: Sun, 25 Jan 2026 18:59:45 GMT
Title: UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders
Authors: Matthew Walmer, Saksham Suri, Anirud Aggarwal, Abhinav Shrivastava,
Abstract summary: UPLiFT is an architecture for Universal Pixel-dense Lightweight Feature Transforms.<n>We show that our Local Attender allows UPLiFT to maintain stable features throughout upsampling.<n>We also show that it achieves competitive performance with state-of-the-art Coupled Flow Matching models for VAE feature upsampling.
Score: 50.099672495919975
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The space of task-agnostic feature upsampling has emerged as a promising area of research to efficiently create denser features from pre-trained visual backbones. These methods act as a shortcut to achieve dense features for a fraction of the cost by learning to map low-resolution features to high-resolution versions. While early works in this space used iterative upsampling approaches, more recent works have switched to cross-attention-based methods, which risk falling into the same efficiency scaling problems of the backbones they are upsampling. In this work, we demonstrate that iterative upsampling methods can still compete with cross-attention-based methods; moreover, they can achieve state-of-the-art performance with lower inference costs. We propose UPLiFT, an architecture for Universal Pixel-dense Lightweight Feature Transforms. We also propose an efficient Local Attender operator to overcome the limitations of prior iterative feature upsampling methods. This operator uses an alternative attentional pooling formulation defined fully locally. We show that our Local Attender allows UPLiFT to maintain stable features throughout upsampling, enabling state-of-the-art performance with lower inference costs than existing pixel-dense feature upsamplers. In addition, we apply UPLiFT to generative downstream tasks and show that it achieves competitive performance with state-of-the-art Coupled Flow Matching models for VAE feature upsampling. Altogether, UPLiFT offers a versatile and efficient approach to creating denser features.

Related papers

Upsample Anything: A Simple and Hard to Beat Baseline for Feature Upsampling [38.24831571443335]
Upsample Anything restores low-resolution features to high-resolution, pixel-wise outputs without any training.<n>It runs in only $approx0.419 texts$ per 224x224 image and achieves state-of-the-art performance on semantic segmentation, depth estimation, and both depth and probability map upsampling.
arXiv Detail & Related papers (2025-11-20T12:27:53Z)
AnyUp: Universal Feature Upsampling [90.67845351280933]
We introduce AnyUp, a method for feature upsampling that can be applied to any vision feature at any resolution.<n>Existing learning-based upsamplers for features like DINO or CLIP need to be re-trained for every feature extractor.
arXiv Detail & Related papers (2025-10-14T17:45:17Z)
JAFAR: Jack up Any Feature at Any Resolution [53.343826346140624]
JAFAR is a lightweight and flexible feature upsampler for Foundation Visions.<n>It enhances the spatial resolution of visual features from any Foundation Vision to an arbitrary target resolution.<n>It generalizes remarkably well to significantly higher output scales.
arXiv Detail & Related papers (2025-06-10T20:53:12Z)
LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models [27.379438040350188]
Feature upsampling offers a promising direction to address this challenge.<n>We introduce a coordinate-based cross-attention transformer that integrates the high-resolution images with coordinates and low-resolution VFM features.<n>Our approach effectively captures fine-grained details and adapts flexibly to various input and feature resolutions.
arXiv Detail & Related papers (2025-04-18T18:46:08Z)
Curvature Informed Furthest Point Sampling [0.0]
We introduce a reinforcement learning-based sampling algorithm that enhances furthest point sampling (FPS) Our approach ranks points by combining FPS-derived soft ranks with curvature scores computed by a deep neural network. We provide comprehensive ablation studies, with both qualitative and quantitative insights into the effect of each feature on performance.
arXiv Detail & Related papers (2024-11-25T23:58:38Z)
A Refreshed Similarity-based Upsampler for Direct High-Ratio Feature Upsampling [54.05517338122698]
A popular similarity-based feature upsampling pipeline has been proposed, which utilizes a high-resolution feature as guidance.<n>We propose an explicitly controllable query-key feature alignment from both semantic-aware and detail-aware perspectives.<n>We develop a fine-grained neighbor selection strategy on HR features, which is simple yet effective for alleviating mosaic artifacts.
arXiv Detail & Related papers (2024-07-02T14:12:21Z)
BIMS-PU: Bi-Directional and Multi-Scale Point Cloud Upsampling [60.257912103351394]
We develop a new point cloud upsampling pipeline called BIMS-PU. We decompose the up/downsampling procedure into several up/downsampling sub-steps by breaking the target sampling factor into smaller factors. We show that our method achieves superior results to state-of-the-art approaches.
arXiv Detail & Related papers (2022-06-25T13:13:37Z)
Hybrid Pixel-Unshuffled Network for Lightweight Image Super-Resolution [64.54162195322246]
Convolutional neural network (CNN) has achieved great success on image super-resolution (SR) Most deep CNN-based SR models take massive computations to obtain high performance. We propose a novel Hybrid Pixel-Unshuffled Network (HPUN) by introducing an efficient and effective downsampling module into the SR task.
arXiv Detail & Related papers (2022-03-16T20:10:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.