Instruct-IPT: All-in-One Image Processing Transformer via Weight Modulation
- URL: http://arxiv.org/abs/2407.00676v1
- Date: Sun, 30 Jun 2024 12:13:34 GMT
- Title: Instruct-IPT: All-in-One Image Processing Transformer via Weight Modulation
- Authors: Yuchuan Tian, Jianhong Han, Hanting Chen, Yuanyuan Xi, Guoyang Zhang, Jie Hu, Chao Xu, Yunhe Wang,
- Abstract summary: We propose Instruct-IPT -- an All-in-One Image Processing Transformer that could effectively address manifold image restoration tasks.
We figure out task-sensitive weights via a toy experiment and introduce task-specific biases on top of them.
We conduct rank analysis for a good compression strategy and perform low-rank decomposition on the biases.
- Score: 25.253522756863727
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the unaffordable size and intensive computation costs of low-level vision models, All-in-One models that are designed to address a handful of low-level vision tasks simultaneously have been popular. However, existing All-in-One models are limited in terms of the range of tasks and performance. To overcome these limitations, we propose Instruct-IPT -- an All-in-One Image Processing Transformer that could effectively address manifold image restoration tasks with large inter-task gaps, such as denoising, deblurring, deraining, dehazing, and desnowing. Rather than popular feature adaptation methods, we propose weight modulation that adapts weights to specific tasks. Firstly, we figure out task-sensitive weights via a toy experiment and introduce task-specific biases on top of them. Secondly, we conduct rank analysis for a good compression strategy and perform low-rank decomposition on the biases. Thirdly, we propose synchronous training that updates the task-general backbone model and the task-specific biases simultaneously. In this way, the model is instructed to learn general and task-specific knowledge. Via our simple yet effective method that instructs the IPT to be task experts, Instruct-IPT could better cooperate between tasks with distinct characteristics at humble costs. Further, we propose to maneuver Instruct-IPT with text instructions for better user interfaces. We have conducted experiments on Instruct-IPT to demonstrate the effectiveness of our method on manifold tasks, and we have effectively extended our method to diffusion denoisers as well. The code is available at https://github.com/huawei-noah/Pretrained-IPT.
Related papers
- Task-Adapter: Task-specific Adaptation of Image Models for Few-shot Action Recognition [34.88916568947695]
We propose a simple but effective task-specific adaptation method (Task-Adapter) for few-shot action recognition.
By introducing the proposed Task-Adapter into the last several layers of the backbone, we mitigate the overfitting problem caused by full fine-tuning.
Experimental results consistently demonstrate the effectiveness of our proposed Task-Adapter on four standard few-shot action recognition datasets.
arXiv Detail & Related papers (2024-08-01T03:06:56Z) - Merging Multi-Task Models via Weight-Ensembling Mixture of Experts [64.94129594112557]
Merging Transformer-based models trained on different tasks into a single unified model can execute all the tasks concurrently.
Previous methods, exemplified by task arithmetic, have been proven to be both effective and scalable.
We propose to merge most of the parameters while upscaling the Transformer layers to a weight-ensembling mixture of experts (MoE) module.
arXiv Detail & Related papers (2024-02-01T08:58:57Z) - Data-CUBE: Data Curriculum for Instruction-based Sentence Representation
Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training.
In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk.
In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z) - On the Effectiveness of LayerNorm Tuning for Continual Learning in
Vision Transformers [47.77328392236625]
State-of-the-art rehearsal-free continual learning methods exploit the peculiarities of Vision Transformers to learn task-specific prompts.
We introduce a two-stage training procedure, where we first optimize the task-specific parameters and then train the classifier with the same selection procedure of the inference time.
Our method achieves results that are either superior or on par with the state of the art while being computationally cheaper.
arXiv Detail & Related papers (2023-08-18T15:11:16Z) - Adaptive Weight Assignment Scheme For Multi-task Learning [0.0]
Deep learning models are used regularly in every applications nowadays.
We can train multiple tasks on a single model under multi-task learning settings.
To train a model in multi-task learning settings we need to sum the loss values from different tasks.
In this paper we propose a simple weight assignment scheme which improves the performance of the model.
arXiv Detail & Related papers (2023-03-10T08:06:08Z) - Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision
Tasks [36.34331439747556]
We propose Polyhistor and Polyhistor-Lite to share information across different tasks with a few trainable parameters.
Specifically, Polyhistor achieves competitive accuracy compared to the state-of-the-art while only using 10% of their trainable parameters.
arXiv Detail & Related papers (2022-10-07T00:25:02Z) - Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks.
We find that their performances are sub-optimal or even lag far behind the single-task baseline.
We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - Towards a Unified Foundation Model: Jointly Pre-Training Transformers on
Unpaired Images and Text [93.11954811297652]
We design a unified transformer consisting of modality-specific tokenizers, a shared transformer encoder, and task-specific output heads.
We employ the separately-trained BERT and ViT models as teachers and apply knowledge distillation to provide additional, accurate supervision signals.
Experiments show that the resultant unified foundation transformer works surprisingly well on both the vision-only and text-only tasks.
arXiv Detail & Related papers (2021-12-14T00:20:55Z) - Pre-Trained Image Processing Transformer [95.93031793337613]
We develop a new pre-trained model, namely, image processing transformer (IPT)
We present to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs.
IPT model is trained on these images with multi-heads and multi-tails.
arXiv Detail & Related papers (2020-12-01T09:42:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.