Multi-Scale Aligned Distillation for Low-Resolution Detection
- URL: http://arxiv.org/abs/2109.06875v1
- Date: Tue, 14 Sep 2021 12:53:35 GMT
- Title: Multi-Scale Aligned Distillation for Low-Resolution Detection
- Authors: Lu Qi, Jason Kuen, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei
Li, Jiaya Jia
- Abstract summary: This paper focuses on boosting the performance of low-resolution models by distilling knowledge from a high- or multi-resolution model.
On several instance-level detection tasks and datasets, the low-resolution models trained via our approach perform competitively with high-resolution models trained via conventional multi-scale training.
- Score: 68.96325141432078
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In instance-level detection tasks (e.g., object detection), reducing input
resolution is an easy option to improve runtime efficiency. However, this
option traditionally hurts the detection performance much. This paper focuses
on boosting the performance of low-resolution models by distilling knowledge
from a high- or multi-resolution model. We first identify the challenge of
applying knowledge distillation (KD) to teacher and student networks that act
on different input resolutions. To tackle it, we explore the idea of spatially
aligning feature maps between models of varying input resolutions by shifting
feature pyramid positions and introduce aligned multi-scale training to train a
multi-scale teacher that can distill its knowledge to a low-resolution student.
Further, we propose crossing feature-level fusion to dynamically fuse teacher's
multi-resolution features to guide the student better. On several
instance-level detection tasks and datasets, the low-resolution models trained
via our approach perform competitively with high-resolution models trained via
conventional multi-scale training, while outperforming the latter's
low-resolution models by 2.1% to 3.6% in terms of mAP. Our code is made
publicly available at https://github.com/dvlab-research/MSAD.
Related papers
- TDDSR: Single-Step Diffusion with Two Discriminators for Super Resolution [28.174638880324014]
We propose TDDSR, an efficient single-step diffusion-based super-resolution method.
Our method, distilled from a pre-trained teacher model and based on a diffusion network, performs super-resolution in a single step.
Experimental results demonstrate its effectiveness across real-world and face-specific SR tasks.
arXiv Detail & Related papers (2024-10-10T07:12:46Z) - Cross-Domain Knowledge Distillation for Low-Resolution Human Pose Estimation [31.970739018426645]
In practical applications of human pose estimation, low-resolution inputs frequently occur, and existing state-of-the-art models perform poorly with low-resolution images.
This work focuses on boosting the performance of low-resolution models by distilling knowledge from a high-resolution model.
arXiv Detail & Related papers (2024-05-19T04:57:17Z) - Attend, Distill, Detect: Attention-aware Entropy Distillation for Anomaly Detection [4.0679780034913335]
A knowledge-distillation based multi-class anomaly detection promises a low latency with a reasonably good performance but with a significant drop as compared to one-class version.
We propose a DCAM (Distributed Convolutional Attention Module) which improves the distillation process between teacher and student networks.
arXiv Detail & Related papers (2024-05-10T13:25:39Z) - Learning to Optimize Permutation Flow Shop Scheduling via Graph-based
Imitation Learning [70.65666982566655]
Permutation flow shop scheduling (PFSS) is widely used in manufacturing systems.
We propose to train the model via expert-driven imitation learning, which accelerates convergence more stably and accurately.
Our model's network parameters are reduced to only 37% of theirs, and the solution gap of our model towards the expert solutions decreases from 6.8% to 1.3% on average.
arXiv Detail & Related papers (2022-10-31T09:46:26Z) - Dynamic Contrastive Distillation for Image-Text Retrieval [90.05345397400144]
We present a novel plug-in dynamic contrastive distillation (DCD) framework to compress image-text retrieval models.
We successfully apply our proposed DCD strategy to two state-of-the-art vision-language pretrained models, i.e. ViLT and METER.
Experiments on MS-COCO and Flickr30K benchmarks show the effectiveness and efficiency of our DCD framework.
arXiv Detail & Related papers (2022-07-04T14:08:59Z) - Pyramid Grafting Network for One-Stage High Resolution Saliency
Detection [29.013012579688347]
We propose a one-stage framework called Pyramid Grafting Network (PGNet) to extract features from different resolution images independently.
An attention-based Cross-Model Grafting Module (CMGM) is proposed to enable CNN branch to combine broken detailed information more holistically.
We contribute a new Ultra-High-Resolution Saliency Detection dataset UHRSD, containing 5,920 images at 4K-8K resolutions.
arXiv Detail & Related papers (2022-04-11T12:22:21Z) - Activation to Saliency: Forming High-Quality Labels for Unsupervised
Salient Object Detection [54.92703325989853]
We propose a two-stage Activation-to-Saliency (A2S) framework that effectively generates high-quality saliency cues.
No human annotations are involved in our framework during the whole training process.
Our framework reports significant performance compared with existing USOD methods.
arXiv Detail & Related papers (2021-12-07T11:54:06Z) - Distribution Alignment: A Unified Framework for Long-tail Visual
Recognition [52.36728157779307]
We propose a unified distribution alignment strategy for long-tail visual recognition.
We then introduce a generalized re-weight method in the two-stage learning to balance the class prior.
Our approach achieves the state-of-the-art results across all four recognition tasks with a simple and unified framework.
arXiv Detail & Related papers (2021-03-30T14:09:53Z) - One-Shot Object Detection without Fine-Tuning [62.39210447209698]
We introduce a two-stage model consisting of a first stage Matching-FCOS network and a second stage Structure-Aware Relation Module.
We also propose novel training strategies that effectively improve detection performance.
Our method exceeds the state-of-the-art one-shot performance consistently on multiple datasets.
arXiv Detail & Related papers (2020-05-08T01:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.