Elastic-DETR: Making Image Resolution Learnable with Content-Specific Network Prediction
- URL: http://arxiv.org/abs/2412.06341v1
- Date: Mon, 09 Dec 2024 09:46:21 GMT
- Title: Elastic-DETR: Making Image Resolution Learnable with Content-Specific Network Prediction
- Authors: Daeun Seo, Hoeseok Yang, Sihyeong Park, Hyungshin Kim,
- Abstract summary: We introduce a novel strategy for learnable resolution, called Elastic-DETR, enabling elastic utilization of multiple image resolutions.
Our network provides an adaptive scale factor based on the content of the image with a compact scale prediction module.
By leveraging the resolution's flexibility, we can demonstrate various models that exhibit varying trade-offs between accuracy and computational complexity.
- Score: 0.612477318852572
- License:
- Abstract: Multi-scale image resolution is a de facto standard approach in modern object detectors, such as DETR. This technique allows for the acquisition of various scale information from multiple image resolutions. However, manual hyperparameter selection of the resolution can restrict its flexibility, which is informed by prior knowledge, necessitating human intervention. This work introduces a novel strategy for learnable resolution, called Elastic-DETR, enabling elastic utilization of multiple image resolutions. Our network provides an adaptive scale factor based on the content of the image with a compact scale prediction module (< 2 GFLOPs). The key aspect of our method lies in how to determine the resolution without prior knowledge. We present two loss functions derived from identified key components for resolution optimization: scale loss, which increases adaptiveness according to the image, and distribution loss, which determines the overall degree of scaling based on network performance. By leveraging the resolution's flexibility, we can demonstrate various models that exhibit varying trade-offs between accuracy and computational complexity. We empirically show that our scheme can unleash the potential of a wide spectrum of image resolutions without constraining flexibility. Our models on MS COCO establish a maximum accuracy gain of 3.5%p or 26% decrease in computation than MS-trained DN-DETR.
Related papers
- FiT: Flexible Vision Transformer for Diffusion Model [81.85667773832279]
We present a transformer architecture specifically designed for generating images with unrestricted resolutions and aspect ratios.
Unlike traditional methods that perceive images as static-resolution grids, FiT conceptualizes images as sequences of dynamically-sized tokens.
Comprehensive experiments demonstrate the exceptional performance of FiT across a broad range of resolutions.
arXiv Detail & Related papers (2024-02-19T18:59:07Z) - DyRA: Portable Dynamic Resolution Adjustment Network for Existing Detectors [0.669087470775851]
This paper introduces DyRA, a dynamic resolution adjustment network providing an image-specific scale factor for existing detectors.
Loss function is devised to minimize the accuracy drop across contrasting objectives of different-sized objects for scaling.
arXiv Detail & Related papers (2023-11-28T07:52:41Z) - ResFormer: Scaling ViTs with Multi-Resolution Training [100.01406895070693]
We introduce ResFormer, a framework for improved performance on a wide spectrum of, mostly unseen, testing resolutions.
In particular, ResFormer operates on replicated images of different resolutions and enforces a scale consistency loss to engage interactive information across different scales.
We demonstrate, moreover, ResFormer is flexible and can be easily extended to semantic segmentation, object detection and video action recognition.
arXiv Detail & Related papers (2022-12-01T18:57:20Z) - Learning Resolution-Adaptive Representations for Cross-Resolution Person
Re-Identification [49.57112924976762]
Cross-resolution person re-identification problem aims to match low-resolution (LR) query identity images against high resolution (HR) gallery images.
It is a challenging and practical problem since the query images often suffer from resolution degradation due to the different capturing conditions from real-world cameras.
This paper explores an alternative SR-free paradigm to directly compare HR and LR images via a dynamic metric, which is adaptive to the resolution of a query image.
arXiv Detail & Related papers (2022-07-09T03:49:51Z) - Scale-arbitrary Invertible Image Downscaling [17.67415618760949]
We propose a scale-Arbitrary Invertible image Downscaling Network (AIDN) to downscale HR images with arbitrary scale factors.
Our AIDN achieves top performance for invertible downscaling with both arbitrary integer and non-integer scale factors.
arXiv Detail & Related papers (2022-01-29T12:27:52Z) - Characterizing and Taming Resolution in Convolutional Neural Networks [4.412616624011115]
Image resolution has a significant effect on the accuracy and computational, storage, and bandwidth costs of computer vision model inference.
We study the accuracy and efficiency tradeoff via systematic and automated tuning of image resolution, image quality and convolutional neural network operators.
We propose a dynamic resolution mechanism that removes the need to statically choose a resolution ahead of time.
arXiv Detail & Related papers (2021-10-28T00:08:23Z) - Resolution Switchable Networks for Runtime Efficient Image Recognition [46.09537029831355]
We propose a general method to train a single convolutional neural network which is capable of switching image resolutions at inference.
Networks trained with the proposed method are named Resolution Switchable Networks (RS-Nets)
arXiv Detail & Related papers (2020-07-19T02:12:59Z) - Learning to Learn Parameterized Classification Networks for Scalable
Input Images [76.44375136492827]
Convolutional Neural Networks (CNNs) do not have a predictable recognition behavior with respect to the input resolution change.
We employ meta learners to generate convolutional weights of main networks for various input scales.
We further utilize knowledge distillation on the fly over model predictions based on different input resolutions.
arXiv Detail & Related papers (2020-07-13T04:27:25Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z) - Gated Fusion Network for Degraded Image Super Resolution [78.67168802945069]
We propose a dual-branch convolutional neural network to extract base features and recovered features separately.
By decomposing the feature extraction step into two task-independent streams, the dual-branch model can facilitate the training process.
arXiv Detail & Related papers (2020-03-02T13:28:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.