BCE vs. CE in Deep Feature Learning
- URL: http://arxiv.org/abs/2505.05813v1
- Date: Fri, 09 May 2025 06:18:31 GMT
- Title: BCE vs. CE in Deep Feature Learning
- Authors: Qiufu Li, Huibin Xiao, Linlin Shen,
- Abstract summary: We compare binary CE (BCE) and cross-entropy (CE) in deep feature learning.<n>BCE can also maximize the intra-class compactness and inter-class distinctiveness when reaching its minimum.<n>BCE measures the absolute values of decision scores and adjust the positive/negative decision scores across all samples to uniformly high/low levels.
- Score: 33.24161955363104
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When training classification models, it expects that the learned features are compact within classes, and can well separate different classes. As the dominant loss function for training classification models, minimizing cross-entropy (CE) loss maximizes the compactness and distinctiveness, i.e., reaching neural collapse (NC). The recent works show that binary CE (BCE) performs also well in multi-class tasks. In this paper, we compare BCE and CE in deep feature learning. For the first time, we prove that BCE can also maximize the intra-class compactness and inter-class distinctiveness when reaching its minimum, i.e., leading to NC. We point out that CE measures the relative values of decision scores in the model training, implicitly enhancing the feature properties by classifying samples one-by-one. In contrast, BCE measures the absolute values of decision scores and adjust the positive/negative decision scores across all samples to uniformly high/low levels. Meanwhile, the classifier biases in BCE present a substantial constraint on the decision scores to explicitly enhance the feature properties in the training. The experimental results are aligned with above analysis, and show that BCE could improve the classification and leads to better compactness and distinctiveness among sample features. The codes will be released.
Related papers
- Self-Classification Enhancement and Correction for Weakly Supervised Object Detection [113.51483527300496]
weakly supervised object detection (WSOD) has attracted much attention due to its low labeling cost.<n>In this work, we introduce a novel WSOD framework to ameliorate these two issues.<n>For one thing, we propose a self-classification enhancement module that integrates intra-class binary classification (ICBC) to bridge the gap between the two distinct MCC tasks.<n>For another, we propose a self-classification correction algorithm during inference, which combines the results of both MCC tasks to effectively reduce the mis-classified predictions.
arXiv Detail & Related papers (2025-05-22T06:45:58Z) - Class Distance Weighted Cross Entropy Loss for Classification of Disease Severity [2.7574609288882312]
We propose a novel loss function, Class Distance Weighted Cross-Entropy (CDW-CE)<n>It penalizes misclassifications more severely when the predicted and actual classes are farther apart.<n>Our results show that CDW-CE consistently improves performance in ordinal image classification tasks.
arXiv Detail & Related papers (2024-12-02T08:06:14Z) - A Theoretical Analysis of Recommendation Loss Functions under Negative Sampling [13.180345241212423]
Loss functions like Categorical Cross Entropy (CCE), Binary Cross Entropy (BCE), and Bayesian Personalized Ranking (BPR) are commonly used in Recommender Systems (RSs) to differentiate positive items - those interacted with by users - and negative items.<n>We show that CCE offers the tightest lower bound on ranking metrics like Normalized Discounted Cumulative Gain (NDCG) and Mean Reciprocal Rank (MRR)<n>Under negative sampling, we reveal that BPR and CCE are equivalent when a single negative sample is drawn, and all three losses converge to the same global minimum.
arXiv Detail & Related papers (2024-11-12T13:06:16Z) - Rediscovering BCE Loss for Uniform Classification [35.66000285310775]
This paper introduces the concept of uniform classification, which employs a unified threshold to classify all samples.
We propose the uniform classification accuracy as a metric to measure the model's performance in uniform classification.
arXiv Detail & Related papers (2024-03-12T03:44:40Z) - Understanding the Detrimental Class-level Effects of Data Augmentation [63.1733767714073]
achieving optimal average accuracy comes at the cost of significantly hurting individual class accuracy by as much as 20% on ImageNet.
We present a framework for understanding how DA interacts with class-level learning dynamics.
We show that simple class-conditional augmentation strategies improve performance on the negatively affected classes.
arXiv Detail & Related papers (2023-12-07T18:37:43Z) - Center Contrastive Loss for Metric Learning [8.433000039153407]
We propose a novel metric learning function called Center Contrastive Loss.
It maintains a class-wise center bank and compares the category centers with the query data points using a contrastive loss.
The proposed loss combines the advantages of both contrastive and classification methods.
arXiv Detail & Related papers (2023-08-01T11:22:51Z) - CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances.
We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data.
Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z) - You Only Need End-to-End Training for Long-Tailed Recognition [8.789819609485225]
Cross-entropy loss tends to produce highly correlated features on imbalanced data.
We propose two novel modules, Block-based Relatively Balanced Batch Sampler (B3RS) and Batch Embedded Training (BET)
Experimental results on the long-tailed classification benchmarks, CIFAR-LT and ImageNet-LT, demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2021-12-11T11:44:09Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Orthogonal Projection Loss [59.61277381836491]
We develop a novel loss function termed Orthogonal Projection Loss' (OPL)
OPL directly enforces inter-class separation alongside intra-class clustering in the feature space.
OPL offers unique advantages as it does not require careful negative mining and is not sensitive to the batch size.
arXiv Detail & Related papers (2021-03-25T17:58:00Z) - Generalized Zero-Shot Learning Via Over-Complete Distribution [79.5140590952889]
We propose to generate an Over-Complete Distribution (OCD) using Conditional Variational Autoencoder (CVAE) of both seen and unseen classes.
The effectiveness of the framework is evaluated using both Zero-Shot Learning and Generalized Zero-Shot Learning protocols.
arXiv Detail & Related papers (2020-04-01T19:05:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.