Rethinking Loss Functions for Fact Verification
- URL: http://arxiv.org/abs/2403.08174v1
- Date: Wed, 13 Mar 2024 01:56:32 GMT
- Title: Rethinking Loss Functions for Fact Verification
- Authors: Yuta Mukobara, Yutaro Shigeto, Masashi Shimbo
- Abstract summary: We develop two task-specific objectives tailored to FEVER.
Experimental results confirm that the proposed objective functions outperform the standard cross-entropy.
- Score: 1.2983290324156112
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We explore loss functions for fact verification in the FEVER shared task.
While the cross-entropy loss is a standard objective for training verdict
predictors, it fails to capture the heterogeneity among the FEVER verdict
classes. In this paper, we develop two task-specific objectives tailored to
FEVER. Experimental results confirm that the proposed objective functions
outperform the standard cross-entropy. Performance is further improved when
these objectives are combined with simple class weighting, which effectively
overcomes the imbalance in the training data. The souce code is available at
https://github.com/yuta-mukobara/RLF-KGAT
Related papers
- Next Generation Loss Function for Image Classification [0.0]
We experimentally challenge the well-known loss functions, including cross entropy (CE) loss, by utilizing the genetic programming (GP) approach.
One function, denoted as Next Generation Loss (NGL), clearly stood out showing same or better performance for all tested datasets.
arXiv Detail & Related papers (2024-04-19T15:26:36Z) - Class Anchor Margin Loss for Content-Based Image Retrieval [97.81742911657497]
We propose a novel repeller-attractor loss that falls in the metric learning paradigm, yet directly optimize for the L2 metric without the need of generating pairs.
We evaluate the proposed objective in the context of few-shot and full-set training on the CBIR task, by using both convolutional and transformer architectures.
arXiv Detail & Related papers (2023-06-01T12:53:10Z) - Decoupled Kullback-Leibler Divergence Loss [75.31157286595517]
Kullback-Leibler (KL) Divergence loss is equivalent to the Doupled Kullback-Leibler (DKL) Divergence loss.
We introduce global information into DKL for intra-class consistency regularization.
The proposed approach achieves new state-of-the-art performance on both tasks, demonstrating the substantial practical merits.
arXiv Detail & Related papers (2023-05-23T11:17:45Z) - SuSana Distancia is all you need: Enforcing class separability in metric
learning via two novel distance-based loss functions for few-shot image
classification [0.9236074230806579]
We propose two loss functions which consider the importance of the embedding vectors by looking at the intra-class and inter-class distance between the few data.
Our results show a significant improvement in accuracy in the miniImagenNet benchmark compared to other metric-based few-shot learning methods by a margin of 2%.
arXiv Detail & Related papers (2023-05-15T23:12:09Z) - Contrastive Classification and Representation Learning with
Probabilistic Interpretation [5.979778557940212]
Cross entropy loss has served as the main objective function for classification-based tasks.
We propose a new version of the supervised contrastive training that learns jointly the parameters of the classifier and the backbone of the network.
arXiv Detail & Related papers (2022-11-07T15:57:24Z) - Bridging the Gap Between Target Networks and Functional Regularization [61.051716530459586]
We propose an explicit Functional Regularization that is a convex regularizer in function space and can easily be tuned.
We analyze the convergence of our method theoretically and empirically demonstrate that replacing Target Networks with the more theoretically grounded Functional Regularization approach leads to better sample efficiency and performance improvements.
arXiv Detail & Related papers (2022-10-21T22:27:07Z) - On Training Targets and Activation Functions for Deep Representation
Learning in Text-Dependent Speaker Verification [18.19207291891767]
Key considerations include training targets, activation functions, and loss functions.
We study a range of loss functions when speaker identity is used as the training target.
We experimentally show that GELU is able to reduce the error rates of TD-SV significantly compared to sigmoid.
arXiv Detail & Related papers (2022-01-17T14:32:51Z) - Mixing between the Cross Entropy and the Expectation Loss Terms [89.30385901335323]
Cross entropy loss tends to focus on hard to classify samples during training.
We show that adding to the optimization goal the expectation loss helps the network to achieve better accuracy.
Our experiments show that the new training protocol improves performance across a diverse set of classification domains.
arXiv Detail & Related papers (2021-09-12T23:14:06Z) - Learning Stable Classifiers by Transferring Unstable Features [59.06169363181417]
We study transfer learning in the presence of spurious correlations.
We experimentally demonstrate that directly transferring the stable feature extractor learned on the source task may not eliminate these biases for the target task.
We hypothesize that the unstable features in the source task and those in the target task are directly related.
arXiv Detail & Related papers (2021-06-15T02:41:12Z) - Optimized Loss Functions for Object detection: A Case Study on Nighttime
Vehicle Detection [0.0]
In this paper, we optimize both two loss functions for classification and localization simultaneously.
Compared to the existing studies, in which the correlation is only applied to improve the localization accuracy for positive samples, this paper utilizes the correlation to obtain the really hard negative samples.
A novel localization loss named MIoU is proposed by incorporating a Mahalanobis distance between predicted box and target box, which eliminate the gradients inconsistency problem in the DIoU loss.
arXiv Detail & Related papers (2020-11-11T03:00:49Z) - Deep F-measure Maximization for End-to-End Speech Understanding [52.36496114728355]
We propose a differentiable approximation to the F-measure and train the network with this objective using standard backpropagation.
We perform experiments on two standard fairness datasets, Adult, Communities and Crime, and also on speech-to-intent detection on the ATIS dataset and speech-to-image concept classification on the Speech-COCO dataset.
In all four of these tasks, F-measure results in improved micro-F1 scores, with absolute improvements of up to 8% absolute, as compared to models trained with the cross-entropy loss function.
arXiv Detail & Related papers (2020-08-08T03:02:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.