Device Tuning for Multi-Task Large Model
- URL: http://arxiv.org/abs/2302.10820v1
- Date: Tue, 21 Feb 2023 16:55:48 GMT
- Title: Device Tuning for Multi-Task Large Model
- Authors: Penghao Jiang, Xuanchen Hou, Yinsi Zhou
- Abstract summary: We propose Device Tuning for the efficient multi-task model, which is a massively multitask framework across the cloud and device.
Specifically, we design Device Tuning architecture of a multi-task model that benefits both cloud modelling and device modelling.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unsupervised pre-training approaches have achieved great success in many
fields such as Computer Vision (CV), Natural Language Processing (NLP) and so
on. However, compared to typical deep learning models, pre-training or even
fine-tuning the state-of-the-art self-attention models is extremely expensive,
as they require much more computational and memory resources. It severely
limits their applications and success in a variety of domains, especially for
multi-task learning. To improve the efficiency, we propose Device Tuning for
the efficient multi-task model, which is a massively multitask framework across
the cloud and device and is designed to encourage learning of representations
that generalize better to many different tasks. Specifically, we design Device
Tuning architecture of a multi-task model that benefits both cloud modelling
and device modelling, which reduces the communication between device and cloud
by representation compression. Experimental results demonstrate the
effectiveness of our proposed method.
Related papers
- EmbedLLM: Learning Compact Representations of Large Language Models [28.49433308281983]
We propose EmbedLLM, a framework designed to learn compact vector representations of Large Language Models.
We introduce an encoder-decoder approach for learning such embeddings, along with a systematic framework to evaluate their effectiveness.
Empirical results show that EmbedLLM outperforms prior methods in model routing both in accuracy and latency.
arXiv Detail & Related papers (2024-10-03T05:43:24Z) - Generative Multimodal Models are In-Context Learners [60.50927925426832]
We introduce Emu2, a generative multimodal model with 37 billion parameters, trained on large-scale multimodal sequences.
Emu2 exhibits strong multimodal in-context learning abilities, even emerging to solve tasks that require on-the-fly reasoning.
arXiv Detail & Related papers (2023-12-20T18:59:58Z) - An Efficient General-Purpose Modular Vision Model via Multi-Task
Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently.
Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - Modular Networks Prevent Catastrophic Interference in Model-Based
Multi-Task Reinforcement Learning [0.8883733362171032]
We study whether model-based multi-task reinforcement learning benefits from shared dynamics models in a similar way model-free methods do from shared policy networks.
Using a single dynamics model, we see clear evidence of task confusion and reduced performance.
As a remedy, enforcing an internal structure for the learned dynamics model by training isolated sub-networks for each task notably improves performance.
arXiv Detail & Related papers (2021-11-15T12:31:31Z) - Scalable and Efficient MoE Training for Multitask Multilingual Models [55.987536562357086]
We develop a system capable of scaling MoE models efficiently to trillions of parameters.
We also present new training methods to improve MoE sample efficiency and leverage expert pruning strategy to improve time efficiency.
A model trained with 10 billion parameters on 50 languages can achieve state-of-the-art performance in Machine Translation (MT) and multilingual natural language generation tasks.
arXiv Detail & Related papers (2021-09-22T00:57:46Z) - Device-Cloud Collaborative Learning for Recommendation [50.01289274123047]
We propose a novel MetaPatch learning approach on the device side to efficiently achieve "thousands of people with thousands of models" given a centralized cloud model.
With billions of updated personalized device models, we propose a "model-over-models" distillation algorithm, namely MoMoDistill, to update the centralized cloud model.
arXiv Detail & Related papers (2021-04-14T05:06:59Z) - Reinforced Multi-Teacher Selection for Knowledge Distillation [54.72886763796232]
knowledge distillation is a popular method for model compression.
Current methods assign a fixed weight to a teacher model in the whole distillation.
Most of the existing methods allocate an equal weight to every teacher model.
In this paper, we observe that, due to the complexity of training examples and the differences in student model capability, learning differentially from teacher models can lead to better performance of student models distilled.
arXiv Detail & Related papers (2020-12-11T08:56:39Z) - NeurAll: Towards a Unified Visual Perception Model for Automated Driving [8.49826472556323]
We propose a joint multi-task network design for learning several tasks simultaneously.
The main bottleneck in automated driving systems is the limited processing power available on deployment hardware.
arXiv Detail & Related papers (2019-02-10T12:45:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.