Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining
- URL: http://arxiv.org/abs/2003.13003v2
- Date: Wed, 16 Sep 2020 15:00:14 GMT
- Title: Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining
- Authors: Chengyu Wang, Minghui Qiu, Jun Huang, Xiaofeng He
- Abstract summary: We propose an effective learning procedure named Meta Fine-Tuning (MFT)
MFT serves as a meta-learner to solve a group of similar NLP tasks for neural language models.
We implement MFT upon BERT to solve several multi-domain text mining tasks.
- Score: 37.2106265998237
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained neural language models bring significant improvement for various
NLP tasks, by fine-tuning the models on task-specific training sets. During
fine-tuning, the parameters are initialized from pre-trained models directly,
which ignores how the learning process of similar NLP tasks in different
domains is correlated and mutually reinforced. In this paper, we propose an
effective learning procedure named Meta Fine-Tuning (MFT), served as a
meta-learner to solve a group of similar NLP tasks for neural language models.
Instead of simply multi-task training over all the datasets, MFT only learns
from typical instances of various domains to acquire highly transferable
knowledge. It further encourages the language model to encode domain-invariant
representations by optimizing a series of novel domain corruption loss
functions. After MFT, the model can be fine-tuned for each domain with better
parameter initializations and higher generalization ability. We implement MFT
upon BERT to solve several multi-domain text mining tasks. Experimental results
confirm the effectiveness of MFT and its usefulness for few-shot learning.
Related papers
- Investigating the potential of Sparse Mixtures-of-Experts for multi-domain neural machine translation [59.41178047749177]
We focus on multi-domain Neural Machine Translation, with the goal of developing efficient models which can handle data from various domains seen during training and are robust to domains unseen during training.
We hypothesize that Sparse Mixture-of-Experts (SMoE) models are a good fit for this task, as they enable efficient model scaling.
We conduct a series of experiments aimed at validating the utility of SMoE for the multi-domain scenario, and find that a straightforward width scaling of Transformer is a simpler and surprisingly more efficient approach in practice, and reaches the same performance level as SMoE.
arXiv Detail & Related papers (2024-07-01T09:45:22Z) - MTLoRA: A Low-Rank Adaptation Approach for Efficient Multi-Task Learning [1.4396109429521227]
Adapting models pre-trained on large-scale datasets to a variety of downstream tasks is a common strategy in deep learning.
parameter-efficient fine-tuning methods have emerged as a promising way to adapt pre-trained models to different tasks while training only a minimal number of parameters.
We introduce MTLoRA, a novel framework for parameter-efficient training of Multi-Task Learning models.
arXiv Detail & Related papers (2024-03-29T17:43:58Z) - Only Send What You Need: Learning to Communicate Efficiently in
Federated Multilingual Machine Translation [19.28500206536013]
Federated learning (FL) is a promising approach for solving multilingual tasks.
We propose a meta-learning-based adaptive parameter selection methodology, MetaSend, that improves the communication efficiency of model transmissions.
We demonstrate that MetaSend obtains substantial improvements over baselines in translation quality in the presence of a limited communication budget.
arXiv Detail & Related papers (2024-01-15T04:04:26Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - Unified Model Learning for Various Neural Machine Translation [63.320005222549646]
Existing machine translation (NMT) studies mainly focus on developing dataset-specific models.
We propose a versatile'' model, i.e., the Unified Model Learning for NMT (UMLNMT) that works with data from different tasks.
OurNMT results in substantial improvements over dataset-specific models with significantly reduced model deployment costs.
arXiv Detail & Related papers (2023-05-04T12:21:52Z) - A Systematic Analysis of Vocabulary and BPE Settings for Optimal
Fine-tuning of NMT: A Case Study of In-domain Translation [0.0]
The choice of vocabulary and SW tokenization has a significant impact on both training and fine-tuning an NMT model.
In this work we compare different strategies for SW tokenization and vocabulary generation with the ultimate goal to uncover an optimal setting for fine-tuning a domain-specific model.
arXiv Detail & Related papers (2023-03-01T18:26:47Z) - Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language
Transfer Learning [59.38343286807997]
We propose Model-Agnostic Multitask Fine-tuning (MAMF) for vision-language models on unseen tasks.
Compared with model-agnostic meta-learning (MAML), MAMF discards the bi-level optimization and uses only first-order gradients.
We show that MAMF consistently outperforms the classical fine-tuning method for few-shot transfer learning on five benchmark datasets.
arXiv Detail & Related papers (2022-03-09T17:26:53Z) - Grad2Task: Improved Few-shot Text Classification Using Gradients for
Task Representation [24.488427641442694]
We propose a novel conditional neural process-based approach for few-shot text classification.
Our key idea is to represent each task using gradient information from a base model.
Our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches.
arXiv Detail & Related papers (2022-01-27T15:29:30Z) - WARP: Word-level Adversarial ReProgramming [13.08689221166729]
In many applications it is preferable to tune much smaller sets of parameters, so that the majority of parameters can be shared across multiple tasks.
We present an alternative approach based on adversarial reprogramming, which extends earlier work on automatic prompt generation.
We show that this approach outperforms other methods with a similar number of trainable parameters on SST-2 and MNLI datasets.
arXiv Detail & Related papers (2021-01-01T00:41:03Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.