Revisiting Distance Metric Learning for Few-Shot Natural Language
Classification
- URL: http://arxiv.org/abs/2211.15202v1
- Date: Mon, 28 Nov 2022 10:19:31 GMT
- Title: Revisiting Distance Metric Learning for Few-Shot Natural Language
Classification
- Authors: Witold Sosnowski, Anna Wr\'oblewska, Karolina Seweryn, Piotr Gawrysiak
- Abstract summary: Under few-shot learning settings, particularly proxy-based DML losses can positively affect the fine-tuning and inference of a supervised language model.
Models tuned with a combination of CCE and ProxyAnchor Loss have, on average, the best performance and outperform models with only CCE by about 3.27 percentage points.
- Score: 1.0323063834827415
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Distance Metric Learning (DML) has attracted much attention in image
processing in recent years. This paper analyzes its impact on supervised
fine-tuning language models for Natural Language Processing (NLP)
classification tasks under few-shot learning settings. We investigated several
DML loss functions in training RoBERTa language models on known SentEval
Transfer Tasks datasets. We also analyzed the possibility of using proxy-based
DML losses during model inference.
Our systematic experiments have shown that under few-shot learning settings,
particularly proxy-based DML losses can positively affect the fine-tuning and
inference of a supervised language model. Models tuned with a combination of
CCE (categorical cross-entropy loss) and ProxyAnchor Loss have, on average, the
best performance and outperform models with only CCE by about 3.27 percentage
points -- up to 10.38 percentage points depending on the training dataset.
Related papers
- Scaling Laws for Predicting Downstream Performance in LLMs [75.28559015477137]
This work focuses on the pre-training loss as a more-efficient metric for performance estimation.
We extend the power law analytical function to predict domain-specific pre-training loss based on FLOPs across data sources.
We employ a two-layer neural network to model the non-linear relationship between multiple domain-specific loss and downstream performance.
arXiv Detail & Related papers (2024-10-11T04:57:48Z) - Leveraging Task-Specific Knowledge from LLM for Semi-Supervised 3D Medical Image Segmentation [9.778201925906913]
We introduce LLM-SegNet, which exploits a large language model (LLM) to integrate task-specific knowledge into our co-training framework.
Experiments on publicly available Left Atrium, Pancreas-CT, and Brats-19 datasets demonstrate the superior performance of LLM-SegNet compared to the state-of-the-art.
arXiv Detail & Related papers (2024-07-06T14:23:16Z) - Class-Imbalanced Semi-Supervised Learning for Large-Scale Point Cloud
Semantic Segmentation via Decoupling Optimization [64.36097398869774]
Semi-supervised learning (SSL) has been an active research topic for large-scale 3D scene understanding.
The existing SSL-based methods suffer from severe training bias due to class imbalance and long-tail distributions of the point cloud data.
We introduce a new decoupling optimization framework, which disentangles feature representation learning and classifier in an alternative optimization manner to shift the bias decision boundary effectively.
arXiv Detail & Related papers (2024-01-13T04:16:40Z) - FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with
Pre-trained Vision-Language Models [62.663113296987085]
Few-shot class-incremental learning aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data.
We introduce two novel components: the Redundant Feature Eliminator (RFE) and the Spatial Noise Compensator (SNC)
Considering the imbalance in existing 3D datasets, we also propose new evaluation metrics that offer a more nuanced assessment of a 3D FSCIL model.
arXiv Detail & Related papers (2023-12-28T14:52:07Z) - Tokenizer Choice For LLM Training: Negligible or Crucial? [30.33170936148845]
We study the influence of tokenizer choice on Large Language Models (LLMs) downstream performance by training 24 mono- and multilingual LLMs.
We find that the tokenizer choice can significantly impact the model's downstream performance and training costs.
We show that multilingual tokenizers trained on the five most frequent European languages require vocabulary size increases of factor three in comparison to English.
arXiv Detail & Related papers (2023-10-12T22:44:19Z) - Evaluating the Capability of Large-scale Language Models on Chinese
Grammatical Error Correction Task [10.597024796304016]
Large-scale language models (LLMs) has shown remarkable capability in various of Natural Language Processing (NLP) tasks.
This report explores the how large language models perform on Chinese grammatical error correction tasks.
arXiv Detail & Related papers (2023-07-08T13:10:59Z) - To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs.
We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting.
Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z) - Probing Out-of-Distribution Robustness of Language Models with
Parameter-Efficient Transfer Learning [17.110208720745064]
In this study, we explore how the ability to detect out-of-distribution changes as the size of the PLM grows or the transfer methods are altered.
We evaluate various PETL techniques, including fine-tuning, Adapter, LoRA, and prefix-tuning, on three different intention classification tasks.
arXiv Detail & Related papers (2023-01-27T11:27:40Z) - Distance Metric Learning Loss Functions in Few-Shot Scenarios of
Supervised Language Models Fine-Tuning [1.0323063834827415]
DML loss function can increase performance on downstream classification tasks of RoBERTa-large models in few-shot scenarios.
Models fine-tuned with the use of SoftTriple loss can achieve better results than models with a standard categorical cross-entropy loss function.
arXiv Detail & Related papers (2022-11-28T10:05:58Z) - On the Compositional Generalization Gap of In-Context Learning [73.09193595292233]
We look at the gap between the in-distribution (ID) and out-of-distribution (OOD) performance of such models in semantic parsing tasks with in-context learning.
We evaluate four model families, OPT, BLOOM, CodeGen and Codex on three semantic parsing datasets.
arXiv Detail & Related papers (2022-11-15T19:56:37Z) - A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis [90.24921443175514]
We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities.
We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention.
Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
arXiv Detail & Related papers (2022-04-11T18:31:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.