Improving Multi-task Generalization Ability for Neural Text Matching via
Prompt Learning
- URL: http://arxiv.org/abs/2204.02725v1
- Date: Wed, 6 Apr 2022 11:01:08 GMT
- Title: Improving Multi-task Generalization Ability for Neural Text Matching via
Prompt Learning
- Authors: Shicheng Xu, Liang Pang, Huawei Shen, Xueqi Cheng
- Abstract summary: Recent state-of-the-art neural text matching models (PLMs) are hard to generalize to different tasks.
We adopt a specialization-generalization training strategy and refer to it as Match-Prompt.
In specialization stage, descriptions of different matching tasks are mapped to only a few prompt tokens.
In generalization stage, text matching model explores the essential matching signals by being trained on diverse multiple matching tasks.
- Score: 54.66399120084227
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text matching is a fundamental technique in both information retrieval and
natural language processing. Text matching tasks share the same paradigm that
determines the relationship between two given texts. Evidently, the
relationships vary from task to task, e.g. relevance in document retrieval,
semantic alignment in paraphrase identification and answerable judgment in
question answering. However, the essential signals for text matching remain in
a finite scope, i.e. exact matching, semantic matching, and inference matching.
Recent state-of-the-art neural text matching models, e.g. pre-trained language
models (PLMs), are hard to generalize to different tasks. It is because the
end-to-end supervised learning on task-specific dataset makes model
overemphasize the data sample bias and task-specific signals instead of the
essential matching signals, which ruins the generalization of model to
different tasks. To overcome this problem, we adopt a
specialization-generalization training strategy and refer to it as
Match-Prompt. In specialization stage, descriptions of different matching tasks
are mapped to only a few prompt tokens. In generalization stage, text matching
model explores the essential matching signals by being trained on diverse
multiple matching tasks. High diverse matching tasks avoid model fitting the
data sample bias on a specific task, so that model can focus on learning the
essential matching signals. Meanwhile, the prompt tokens obtained in the first
step are added to the corresponding tasks to help the model distinguish
different task-specific matching signals. Experimental results on eighteen
public datasets show that Match-Prompt can significantly improve multi-task
generalization capability of PLMs in text matching, and yield better in-domain
multi-task, out-of-domain multi-task and new task adaptation performance than
task-specific model.
Related papers
- Distribution Matching for Multi-Task Learning of Classification Tasks: a
Large-Scale Study on Faces & Beyond [62.406687088097605]
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space.
We show that MTL can be successful with classification tasks with little, or non-overlapping annotations.
We propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching.
arXiv Detail & Related papers (2024-01-02T14:18:11Z) - TransPrompt v2: A Transferable Prompting Framework for Cross-task Text
Classification [37.824031151922604]
We propose TransPrompt v2, a novel transferable prompting framework for few-shot learning across similar or distant text classification tasks.
For learning across similar tasks, we employ a multi-task meta-knowledge acquisition (MMA) procedure to train a meta-learner.
For learning across distant tasks, we inject the task type descriptions into the prompt, and capture the intra-type and inter-type prompt embeddings.
arXiv Detail & Related papers (2023-08-29T04:16:57Z) - Effective Cross-Task Transfer Learning for Explainable Natural Language
Inference with T5 [50.574918785575655]
We compare sequential fine-tuning with a model for multi-task learning in the context of boosting performance on two tasks.
Our results show that while sequential multi-task learning can be tuned to be good at the first of two target tasks, it performs less well on the second and additionally struggles with overfitting.
arXiv Detail & Related papers (2022-10-31T13:26:08Z) - Unsupervised Mismatch Localization in Cross-Modal Sequential Data [5.932046800902776]
We develop an unsupervised learning algorithm that can infer the relationship between content-mismatched cross-modal data.
We propose a hierarchical Bayesian deep learning model, named mismatch localization variational autoencoder (ML-VAE), that decomposes the generative process of the speech into hierarchically structured latent variables.
Our experimental results show that ML-VAE successfully locates the mismatch between text and speech, without the need for human annotations.
arXiv Detail & Related papers (2022-05-05T14:23:27Z) - Grad2Task: Improved Few-shot Text Classification Using Gradients for
Task Representation [24.488427641442694]
We propose a novel conditional neural process-based approach for few-shot text classification.
Our key idea is to represent each task using gradient information from a base model.
Our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches.
arXiv Detail & Related papers (2022-01-27T15:29:30Z) - Exploring Relational Context for Multi-Task Dense Prediction [76.86090370115]
We consider a multi-task environment for dense prediction tasks, represented by a common backbone and independent task-specific heads.
We explore various attention-based contexts, such as global and local, in the multi-task setting.
We propose an Adaptive Task-Relational Context module, which samples the pool of all available contexts for each task pair.
arXiv Detail & Related papers (2021-04-28T16:45:56Z) - Learning to Match Jobs with Resumes from Sparse Interaction Data using
Multi-View Co-Teaching Network [83.64416937454801]
Job-resume interaction data is sparse and noisy, which affects the performance of job-resume match algorithms.
We propose a novel multi-view co-teaching network from sparse interaction data for job-resume matching.
Our model is able to outperform state-of-the-art methods for job-resume matching.
arXiv Detail & Related papers (2020-09-25T03:09:54Z) - Low Resource Multi-Task Sequence Tagging -- Revisiting Dynamic
Conditional Random Fields [67.51177964010967]
We compare different models for low resource multi-task sequence tagging that leverage dependencies between label sequences for different tasks.
We find that explicit modeling of inter-dependencies between task predictions outperforms single-task as well as standard multi-task models.
arXiv Detail & Related papers (2020-05-01T07:11:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.