Distance Metric Learning Loss Functions in Few-Shot Scenarios of
Supervised Language Models Fine-Tuning
- URL: http://arxiv.org/abs/2211.15195v1
- Date: Mon, 28 Nov 2022 10:05:58 GMT
- Title: Distance Metric Learning Loss Functions in Few-Shot Scenarios of
Supervised Language Models Fine-Tuning
- Authors: Witold Sosnowski, Karolina Seweryn, Anna Wr\'oblewska, Piotr Gawrysiak
- Abstract summary: DML loss function can increase performance on downstream classification tasks of RoBERTa-large models in few-shot scenarios.
Models fine-tuned with the use of SoftTriple loss can achieve better results than models with a standard categorical cross-entropy loss function.
- Score: 1.0323063834827415
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents an analysis regarding an influence of the Distance Metric
Learning (DML) loss functions on the supervised fine-tuning of the language
models for classification tasks. We experimented with known datasets from
SentEval Transfer Tasks.
Our experiments show that applying the DML loss function can increase
performance on downstream classification tasks of RoBERTa-large models in
few-shot scenarios. Models fine-tuned with the use of SoftTriple loss can
achieve better results than models with a standard categorical cross-entropy
loss function by about 2.89 percentage points from 0.04 to 13.48 percentage
points depending on the training dataset. Additionally, we accomplished a
comprehensive analysis with explainability techniques to assess the models'
reliability and explain their results.
Related papers
- A Versatile Influence Function for Data Attribution with Non-Decomposable Loss [3.1615846013409925]
We propose a Versatile Influence Function (VIF) that can be straightforwardly applied to machine learning models trained with any non-decomposable loss.
VIF represents a significant advancement in data attribution, enabling efficient influence-function-based attribution across a wide range of machine learning paradigms.
arXiv Detail & Related papers (2024-12-02T09:59:01Z) - Dissecting Representation Misalignment in Contrastive Learning via Influence Function [15.28417468377201]
We introduce the Extended Influence Function for Contrastive Loss (ECIF), an influence function crafted for contrastive loss.
ECIF considers both positive and negative samples and provides a closed-form approximation of contrastive learning models.
Building upon ECIF, we develop a series of algorithms for data evaluation, misalignment detection, and misprediction trace-back tasks.
arXiv Detail & Related papers (2024-11-18T15:45:41Z) - Impact of Missing Values in Machine Learning: A Comprehensive Analysis [0.0]
This paper aims to examine the nuanced impact of missing values on machine learning (ML) models.
Our analysis focuses on the challenges posed by missing values, including biased inferences, reduced predictive power, and increased computational burdens.
The study employs case studies and real-world examples to illustrate the practical implications of addressing missing values.
arXiv Detail & Related papers (2024-10-10T18:31:44Z) - Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration [74.09687562334682]
We introduce a novel training data attribution method called Debias and Denoise Attribution (DDA)
Our method significantly outperforms existing approaches, achieving an averaged AUC of 91.64%.
DDA exhibits strong generality and scalability across various sources and different-scale models like LLaMA2, QWEN2, and Mistral.
arXiv Detail & Related papers (2024-10-02T07:14:26Z) - Low-rank finetuning for LLMs: A fairness perspective [54.13240282850982]
Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models.
This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution.
We show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors.
arXiv Detail & Related papers (2024-05-28T20:43:53Z) - Scaling Relationship on Learning Mathematical Reasoning with Large
Language Models [75.29595679428105]
We investigate how the pre-training loss, supervised data amount, and augmented data amount influence the reasoning performances of a supervised LLM.
We find that rejection samples from multiple models push LLaMA-7B to an accuracy of 49.3% on GSM8K which outperforms the supervised fine-tuning (SFT) accuracy of 35.9% significantly.
arXiv Detail & Related papers (2023-08-03T15:34:01Z) - A Comprehensive Evaluation and Analysis Study for Chinese Spelling Check [53.152011258252315]
We show that using phonetic and graphic information reasonably is effective for Chinese Spelling Check.
Models are sensitive to the error distribution of the test set, which reflects the shortcomings of models.
The commonly used benchmark, SIGHAN, can not reliably evaluate models' performance.
arXiv Detail & Related papers (2023-07-25T17:02:38Z) - Revisiting Distance Metric Learning for Few-Shot Natural Language
Classification [1.0323063834827415]
Under few-shot learning settings, particularly proxy-based DML losses can positively affect the fine-tuning and inference of a supervised language model.
Models tuned with a combination of CCE and ProxyAnchor Loss have, on average, the best performance and outperform models with only CCE by about 3.27 percentage points.
arXiv Detail & Related papers (2022-11-28T10:19:31Z) - Discover, Explanation, Improvement: An Automatic Slice Detection
Framework for Natural Language Processing [72.14557106085284]
slice detection models (SDM) automatically identify underperforming groups of datapoints.
This paper proposes a benchmark named "Discover, Explain, improve (DEIM)" for classification NLP tasks.
Our evaluation shows that Edisa can accurately select error-prone datapoints with informative semantic features.
arXiv Detail & Related papers (2022-11-08T19:00:00Z) - RoBLEURT Submission for the WMT2021 Metrics Task [72.26898579202076]
We present our submission to the Shared Metrics Task: RoBLEURT.
Our model reaches state-of-the-art correlations with the WMT 2020 human annotations upon 8 out of 10 to-English language pairs.
arXiv Detail & Related papers (2022-04-28T08:49:40Z) - Deep Learning Models for Knowledge Tracing: Review and Empirical
Evaluation [2.423547527175807]
We review and evaluate a body of deep learning knowledge tracing (DLKT) models with openly available and widely-used data sets.
The evaluated DLKT models have been reimplemented for assessing and replicability of previously reported results.
arXiv Detail & Related papers (2021-12-30T14:19:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.