Impossible Triangle: What's Next for Pre-trained Language Models?
- URL: http://arxiv.org/abs/2204.06130v1
- Date: Wed, 13 Apr 2022 01:28:18 GMT
- Title: Impossible Triangle: What's Next for Pre-trained Language Models?
- Authors: Chenguang Zhu, Michael Zeng
- Abstract summary: We argue that all existing PLM models lack one or more properties from the Impossible Triangle.
We then offer insights into future research directions of PLMs to achieve the Impossible Triangle.
- Score: 53.99691912972306
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent development of large-scale pre-trained language models (PLM) have
significantly improved the capability of models in various NLP tasks, in terms
of performance after task-specific fine-tuning and zero-shot / few-shot
learning. However, many of such models come with a dauntingly huge size that
few institutions can afford to pre-train, fine-tune or even deploy, while
moderate-sized models usually lack strong generalized few-shot learning
capabilities. In this paper, we first elaborate the current obstacles of using
PLM models in terms of the Impossible Triangle: 1) moderate model size, 2)
state-of-the-art few-shot learning capability, and 3) state-of-the-art
fine-tuning capability. We argue that all existing PLM models lack one or more
properties from the Impossible Triangle. To remedy these missing properties of
PLMs, various techniques have been proposed, such as knowledge distillation,
data augmentation and prompt learning, which inevitably brings additional work
to the application of PLMs in real scenarios. We then offer insights into
future research directions of PLMs to achieve the Impossible Triangle, and
break down the task into several key phases.
Related papers
- LLM Augmented LLMs: Expanding Capabilities through Composition [56.40953749310957]
CALM -- Composition to Augment Language Models -- introduces cross-attention between models to compose their representations and enable new capabilities.
We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13% on tasks like translation into English.
When PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40% over the base model for code generation and explanation tasks.
arXiv Detail & Related papers (2024-01-04T18:53:01Z) - Make a Donut: Hierarchical EMD-Space Planning for Zero-Shot Deformable Manipulation with Tools [14.069149456110676]
We introduce a demonstration-free hierarchical planning approach capable of tackling intricate long-horizon tasks.
We employ large language models (LLMs) to articulate a high-level, stage-by-stage plan corresponding to a specified task.
We further substantiate our approach with experimental trials on real-world robotic platforms.
arXiv Detail & Related papers (2023-11-05T22:43:29Z) - Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning [52.29522018586365]
We study structured pruning as an effective means to develop smaller LLMs from pre-trained, larger models.
Our approach employs two key techniques: (1) targeted structured pruning, which prunes a larger model to a specified target shape by removing layers, heads, and intermediate and hidden dimensions in an end-to-end manner, and (2) dynamic batch loading, which dynamically updates the composition of sampled data in each training batch based on varying losses across different domains.
arXiv Detail & Related papers (2023-10-10T15:13:30Z) - Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for
Large Language Models [125.91897197446379]
We find that MoE models benefit more from instruction tuning than dense models.
Our most powerful model, FLAN-MOE-32B, surpasses the performance of FLAN-PALM-62B on four benchmark tasks.
arXiv Detail & Related papers (2023-05-24T04:22:26Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey [66.18478838828231]
Multi-modal pre-trained big models have drawn more and more attention in recent years.
This paper introduces the background of multi-modal pre-training by reviewing the conventional deep, pre-training works in natural language process, computer vision, and speech.
Then, we introduce the task definition, key challenges, and advantages of multi-modal pre-training models (MM-PTMs), and discuss the MM-PTMs with a focus on data, objectives, network, and knowledge enhanced pre-training.
arXiv Detail & Related papers (2023-02-20T15:34:03Z) - WeLM: A Well-Read Pre-trained Language Model for Chinese [37.68378062625651]
We present WeLM: a well-read pre-trained language model for Chinese.
We show that WeLM is equipped with broad knowledge on various domains and languages.
arXiv Detail & Related papers (2022-09-21T14:05:30Z) - Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese [33.83704598544326]
Mengzi stands for a family of discriminative, generative, domain-specific, and multimodal pre-trained model variants.
Compared with public Chinese PLMs, Mengzi is simple but more powerful.
Our lightweight model has achieved new state-of-the-art results on the widely-used CLUE benchmark.
arXiv Detail & Related papers (2021-10-13T13:14:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.