Related papers: Adaptable Text Matching via Meta-Weight Regulator

Adaptable Text Matching via Meta-Weight Regulator

URL: http://arxiv.org/abs/2204.12668v1
Date: Wed, 27 Apr 2022 02:28:40 GMT
Title: Adaptable Text Matching via Meta-Weight Regulator
Authors: Bo Zhang, Chen Zhang, Fang Ma, Dawei Song
Abstract summary: Meta-Weight Regulator (MWR) is a meta-learning approach that learns to assign weights to the source examples based on their relevance to the target loss. MWR first trains the model on uniformly weighted source examples, and measures the efficacy of the model on the target examples via a loss function. As MWR is model-agnostic, it can be applied to any backbone neural model.
Score: 14.619068650513917
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Neural text matching models have been used in a range of applications such as question answering and natural language inference, and have yielded a good performance. However, these neural models are of a limited adaptability, resulting in a decline in performance when encountering test examples from a different dataset or even a different task. The adaptability is particularly important in the few-shot setting: in many cases, there is only a limited amount of labeled data available for a target dataset or task, while we may have access to a richly labeled source dataset or task. However, adapting a model trained on the abundant source data to a few-shot target dataset or task is challenging. To tackle this challenge, we propose a Meta-Weight Regulator (MWR), which is a meta-learning approach that learns to assign weights to the source examples based on their relevance to the target loss. Specifically, MWR first trains the model on the uniformly weighted source examples, and measures the efficacy of the model on the target examples via a loss function. By iteratively performing a (meta) gradient descent, high-order gradients are propagated to the source examples. These gradients are then used to update the weights of source examples, in a way that is relevant to the target performance. As MWR is model-agnostic, it can be applied to any backbone neural model. Extensive experiments are conducted with various backbone text matching models, on four widely used datasets and two tasks. The results demonstrate that our proposed approach significantly outperforms a number of existing adaptation methods and effectively improves the cross-dataset and cross-task adaptability of the neural text matching models in the few-shot setting.

Related papers

Prompt Tuning Vision Language Models with Margin Regularizer for Few-Shot Learning under Distribution Shifts [13.21626568246313]
We analyze whether vision-language foundation models can be adapted to target datasets with very different distributions and classes.<n>We propose a novel prompt-tuning method, PromptMargin, for adapting such large-scale VLMs directly on the few target samples.<n>PromptMargin effectively tunes the text as well as visual prompts for this task, and has two main modules.
arXiv Detail & Related papers (2025-05-21T13:26:56Z)
Scalable Influence and Fact Tracing for Large Language Model Pretraining [14.598556308631018]
Training data attribution (TDA) methods aim to attribute model outputs back to specific training examples. This paper refines existing gradient-based methods to work effectively at scale.
arXiv Detail & Related papers (2024-10-22T20:39:21Z)
Target-Aware Language Modeling via Granular Data Sampling [25.957424920194914]
Language model pretraining generally targets a broad range of use cases and incorporates data from diverse sources. A cost-effective and straightforward approach is sampling with low-dimensional data features. We show that pretrained models perform on par with the full RefinedWeb data and outperform randomly selected samples for model sizes ranging from 125M to 1.5B.
arXiv Detail & Related papers (2024-09-23T04:52:17Z)
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts [20.202031878825153]
We propose a novel dynamic data mixture for MoE instruction tuning. Inspired by MoE's token routing preference, we build dataset-level representations and then capture the subtle differences among datasets. Results on two MoE models demonstrate the effectiveness of our approach on both downstream knowledge & reasoning tasks and open-ended queries.
arXiv Detail & Related papers (2024-06-17T06:47:03Z)
Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning. Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation. Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z)
Synthetic Data Generation in Low-Resource Settings via Fine-Tuning of Large Language Models [15.991777903345575]
Large language models can generalize to novel downstream tasks with relatively few labeled examples. Alternatively, smaller models can solve specific tasks if fine-tuned with enough labeled examples. We study synthetic data generation of fine-tuning training data via fine-tuned teacher LLMs to improve the downstream performance of much smaller models.
arXiv Detail & Related papers (2023-10-02T11:49:05Z)
Informative Data Mining for One-Shot Cross-Domain Semantic Segmentation [84.82153655786183]
We propose a novel framework called Informative Data Mining (IDM) to enable efficient one-shot domain adaptation for semantic segmentation. IDM provides an uncertainty-based selection criterion to identify the most informative samples, which facilitates quick adaptation and reduces redundant training. Our approach outperforms existing methods and achieves a new state-of-the-art one-shot performance of 56.7%/55.4% on the GTA5/SYNTHIA to Cityscapes adaptation tasks.
arXiv Detail & Related papers (2023-09-25T15:56:01Z)
CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep Learning [55.733193075728096]
Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance. Sample re-weighting methods are popularly used to alleviate this data bias issue. We propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data.
arXiv Detail & Related papers (2022-02-11T13:49:51Z)
Adaptive Consistency Regularization for Semi-Supervised Transfer Learning [31.66745229673066]
We consider semi-supervised learning and transfer learning jointly, leading to a more practical and competitive paradigm. To better exploit the value of both pre-trained weights and unlabeled target examples, we introduce adaptive consistency regularization. Our proposed adaptive consistency regularization outperforms state-of-the-art semi-supervised learning techniques such as Pseudo Label, Mean Teacher, and MixMatch.
arXiv Detail & Related papers (2021-03-03T05:46:39Z)
Meta-learning One-class Classifiers with Eigenvalue Solvers for Supervised Anomaly Detection [55.888835686183995]
We propose a neural network-based meta-learning method for supervised anomaly detection. We experimentally demonstrate that the proposed method achieves better performance than existing anomaly detection and few-shot learning methods.
arXiv Detail & Related papers (2021-03-01T01:43:04Z)
Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples. We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries. We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
Improving Multi-Turn Response Selection Models with Complementary Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals. We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv Detail & Related papers (2020-02-18T06:29:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.