Does the Data Processing Inequality Reflect Practice? On the Utility of Low-Level Tasks
- URL: http://arxiv.org/abs/2512.21315v1
- Date: Wed, 24 Dec 2025 18:21:01 GMT
- Title: Does the Data Processing Inequality Reflect Practice? On the Utility of Low-Level Tasks
- Authors: Roy Turgeman, Tom Tirer,
- Abstract summary: We show that for any finite number of training samples, there exists a pre-classification processing that improves the classification accuracy.<n>We also explore the effect of class separation, training set size, and class balance on the relative gain from this procedure.
- Score: 15.03974529275767
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The data processing inequality is an information-theoretic principle stating that the information content of a signal cannot be increased by processing the observations. In particular, it suggests that there is no benefit in enhancing the signal or encoding it before addressing a classification problem. This assertion can be proven to be true for the case of the optimal Bayes classifier. However, in practice, it is common to perform "low-level" tasks before "high-level" downstream tasks despite the overwhelming capabilities of modern deep neural networks. In this paper, we aim to understand when and why low-level processing can be beneficial for classification. We present a comprehensive theoretical study of a binary classification setup, where we consider a classifier that is tightly connected to the optimal Bayes classifier and converges to it as the number of training samples increases. We prove that for any finite number of training samples, there exists a pre-classification processing that improves the classification accuracy. We also explore the effect of class separation, training set size, and class balance on the relative gain from this procedure. We support our theory with an empirical investigation of the theoretical setup. Finally, we conduct an empirical study where we investigate the effect of denoising and encoding on the performance of practical deep classifiers on benchmark datasets. Specifically, we vary the size and class distribution of the training set, and the noise level, and demonstrate trends that are consistent with our theoretical results.
Related papers
- Impact of Noisy Supervision in Foundation Model Learning [91.56591923244943]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.<n>We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Deep Imbalanced Regression via Hierarchical Classification Adjustment [50.19438850112964]
Regression tasks in computer vision are often formulated into classification by quantizing the target space into classes.
The majority of training samples lie in a head range of target values, while a minority of samples span a usually larger tail range.
We propose to construct hierarchical classifiers for solving imbalanced regression tasks.
Our novel hierarchical classification adjustment (HCA) for imbalanced regression shows superior results on three diverse tasks.
arXiv Detail & Related papers (2023-10-26T04:54:39Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - Audio Contrastive-based Fine-tuning: Decoupling Representation Learning and Classification [26.82307246813389]
We propose a disentangled two-stage framework that separates representation refinement from downstream evaluation.<n>First, we employ a "contrastive-tuning" stage to explicitly improve the geometric structure of the model's embedding space.<n>We then introduce a dual-probe evaluation protocol to assess the quality of these refined representations from a geometric perspective.
arXiv Detail & Related papers (2023-09-21T08:59:13Z) - An Upper Bound for the Distribution Overlap Index and Its Applications [22.92968284023414]
This paper proposes an easy-to-compute upper bound for the overlap index between two probability distributions.<n>The proposed bound shows its value in one-class classification and domain shift analysis.<n>Our work shows significant promise toward broadening the applications of overlap-based metrics.
arXiv Detail & Related papers (2022-12-16T20:02:03Z) - Mutual Information Learned Classifiers: an Information-theoretic
Viewpoint of Training Deep Learning Classification Systems [9.660129425150926]
Cross entropy loss can easily lead us to find models which demonstrate severe overfitting behavior.
In this paper, we prove that the existing cross entropy loss minimization for training DNN classifiers essentially learns the conditional entropy of the underlying data distribution.
We propose a mutual information learning framework where we train DNN classifiers via learning the mutual information between the label and input.
arXiv Detail & Related papers (2022-10-03T15:09:19Z) - Improving Self-supervised Learning for Out-of-distribution Task via
Auxiliary Classifier [6.61825491400122]
We observe a strong relationship between rotation prediction (self-supervised) accuracy and semantic classification accuracy on OOD tasks.
We introduce an additional auxiliary classification head in our multi-task network along with semantic classification and rotation prediction head.
Our proposed learning method is framed into bi-level optimisation problem where the upper-level is trained to update the parameters for semantic classification and rotation prediction head.
arXiv Detail & Related papers (2022-09-07T02:00:01Z) - Do We Really Need a Learnable Classifier at the End of Deep Neural
Network? [118.18554882199676]
We study the potential of learning a neural network for classification with the classifier randomly as an ETF and fixed during training.
Our experimental results show that our method is able to achieve similar performances on image classification for balanced datasets.
arXiv Detail & Related papers (2022-03-17T04:34:28Z) - You Only Need End-to-End Training for Long-Tailed Recognition [8.789819609485225]
Cross-entropy loss tends to produce highly correlated features on imbalanced data.
We propose two novel modules, Block-based Relatively Balanced Batch Sampler (B3RS) and Batch Embedded Training (BET)
Experimental results on the long-tailed classification benchmarks, CIFAR-LT and ImageNet-LT, demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2021-12-11T11:44:09Z) - Prototypical Classifier for Robust Class-Imbalanced Learning [64.96088324684683]
We propose textitPrototypical, which does not require fitting additional parameters given the embedding network.
Prototypical produces balanced and comparable predictions for all classes even though the training set is class-imbalanced.
We test our method on CIFAR-10LT, CIFAR-100LT and Webvision datasets, observing that Prototypical obtains substaintial improvements compared with state of the arts.
arXiv Detail & Related papers (2021-10-22T01:55:01Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.