Transfer Learning for Sequence Generation: from Single-source to
Multi-source
- URL: http://arxiv.org/abs/2105.14809v1
- Date: Mon, 31 May 2021 09:12:38 GMT
- Title: Transfer Learning for Sequence Generation: from Single-source to
Multi-source
- Authors: Xuancheng Huang, Jingfang Xu, Maosong Sun, and Yang Liu
- Abstract summary: We propose a two-stage finetuning method to alleviate the pretrain-finetune discrepancy and introduce a novel MSG model with a fine encoder to learn better representations in MSG tasks.
Our approach achieves new state-of-the-art results on the WMT17 APE task and multi-source translation task using the WMT14 test set.
- Score: 50.34044254589968
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Multi-source sequence generation (MSG) is an important kind of sequence
generation tasks that takes multiple sources, including automatic post-editing,
multi-source translation, multi-document summarization, etc. As MSG tasks
suffer from the data scarcity problem and recent pretrained models have been
proven to be effective for low-resource downstream tasks, transferring
pretrained sequence-to-sequence models to MSG tasks is essential. Although
directly finetuning pretrained models on MSG tasks and concatenating multiple
sources into a single long sequence is regarded as a simple method to transfer
pretrained models to MSG tasks, we conjecture that the direct finetuning method
leads to catastrophic forgetting and solely relying on pretrained
self-attention layers to capture cross-source information is not sufficient.
Therefore, we propose a two-stage finetuning method to alleviate the
pretrain-finetune discrepancy and introduce a novel MSG model with a fine
encoder to learn better representations in MSG tasks. Experiments show that our
approach achieves new state-of-the-art results on the WMT17 APE task and
multi-source translation task using the WMT14 test set. When adapted to
document-level translation, our framework outperforms strong baselines
significantly.
Related papers
- Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large
Language Models [46.92994945808424]
Catastrophic forgetting emerges as a critical challenge when fine-tuning multi-modal large language models (MLLMs)
This paper presents a comprehensive analysis of catastrophic forgetting in MLLMs and introduces a post-training adjustment method called Model Tailor.
arXiv Detail & Related papers (2024-02-19T11:02:05Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning [28.12788291168137]
We present a multi-task fine-tuning framework, MFTcoder, that enables simultaneous and parallel fine-tuning on multiple tasks.
Experiments have conclusively demonstrated that our multi-task fine-tuning approach outperforms both individual fine-tuning on single tasks and fine-tuning on a mixed ensemble of tasks.
arXiv Detail & Related papers (2023-11-04T02:22:40Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - Foundation Model is Efficient Multimodal Multitask Model Selector [47.017463595702274]
A brute-force approach is to finetune all models on all target datasets, bringing high computational costs.
We propose an efficient multi-task model selector (EMMS) to transform diverse label formats into a unified noisy label embedding.
EMMS is fast, effective, and generic enough to assess the transferability of pre-trained models, making it the first model selection method in the multi-task scenario.
arXiv Detail & Related papers (2023-08-11T17:54:44Z) - Diffusion Model is an Effective Planner and Data Synthesizer for
Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis.
For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language
Transfer Learning [59.38343286807997]
We propose Model-Agnostic Multitask Fine-tuning (MAMF) for vision-language models on unseen tasks.
Compared with model-agnostic meta-learning (MAML), MAMF discards the bi-level optimization and uses only first-order gradients.
We show that MAMF consistently outperforms the classical fine-tuning method for few-shot transfer learning on five benchmark datasets.
arXiv Detail & Related papers (2022-03-09T17:26:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.