Studying the Interplay between Information Loss and Operation Loss in
Representations for Classification
- URL: http://arxiv.org/abs/2112.15238v1
- Date: Thu, 30 Dec 2021 23:17:05 GMT
- Title: Studying the Interplay between Information Loss and Operation Loss in
Representations for Classification
- Authors: Jorge F. Silva, Felipe Tobar, Mario Vicu\~na and Felipe Cordova
- Abstract summary: Information-theoretic measures have been widely adopted in the design of features for learning and decision problems.
We show that it is possible to adopt an alternative notion of informational sufficiency to achieve operational sufficiency in learning.
- Score: 15.369895042965261
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Information-theoretic measures have been widely adopted in the design of
features for learning and decision problems. Inspired by this, we look at the
relationship between i) a weak form of information loss in the Shannon sense
and ii) the operation loss in the minimum probability of error (MPE) sense when
considering a family of lossy continuous representations (features) of a
continuous observation. We present several results that shed light on this
interplay. Our first result offers a lower bound on a weak form of information
loss as a function of its respective operation loss when adopting a discrete
lossy representation (quantization) instead of the original raw observation.
From this, our main result shows that a specific form of vanishing information
loss (a weak notion of asymptotic informational sufficiency) implies a
vanishing MPE loss (or asymptotic operational sufficiency) when considering a
general family of lossy continuous representations. Our theoretical findings
support the observation that the selection of feature representations that
attempt to capture informational sufficiency is appropriate for learning, but
this selection is a rather conservative design principle if the intended goal
is achieving MPE in classification. Supporting this last point, and under some
structural conditions, we show that it is possible to adopt an alternative
notion of informational sufficiency (strictly weaker than pure sufficiency in
the mutual information sense) to achieve operational sufficiency in learning.
Related papers
- Dual-Head Knowledge Distillation: Enhancing Logits Utilization with an Auxiliary Head [38.898038672237746]
We introduce a logit-level loss function as a supplement to the widely used probability-level loss function.
We find that the amalgamation of the newly introduced logit-level loss and the previous probability-level loss will lead to performance degeneration.
We propose a novel method called dual-head knowledge distillation, which partitions the linear classifier into two classification heads responsible for different losses.
arXiv Detail & Related papers (2024-11-13T12:33:04Z) - Learning Latent Graph Structures and their Uncertainty [63.95971478893842]
Graph Neural Networks (GNNs) use relational information as an inductive bias to enhance the model's accuracy.
As task-relevant relations might be unknown, graph structure learning approaches have been proposed to learn them while solving the downstream prediction task.
arXiv Detail & Related papers (2024-05-30T10:49:22Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Unveiling the Potential of Probabilistic Embeddings in Self-Supervised
Learning [4.124934010794795]
Self-supervised learning has played a pivotal role in advancing machine learning by allowing models to acquire meaningful representations from unlabeled data.
We investigate the impact of probabilistic modeling on the information bottleneck, shedding light on a trade-off between compression and preservation of information in both representation and loss space.
Our findings suggest that introducing an additional bottleneck in the loss space can significantly enhance the ability to detect out-of-distribution examples.
arXiv Detail & Related papers (2023-10-27T12:01:16Z) - SINCERE: Supervised Information Noise-Contrastive Estimation REvisited [5.004880836963827]
Previous work suggests a supervised contrastive (SupCon) loss to extend InfoNCE to learn from available class labels.
We propose the Supervised InfoNCE REvisited (SINCERE) loss as a theoretically-justified supervised extension of InfoNCE.
Experiments show that SINCERE leads to better separation of embeddings from different classes and improves transfer learning classification accuracy.
arXiv Detail & Related papers (2023-09-25T16:40:56Z) - A survey and taxonomy of loss functions in machine learning [51.35995529962554]
We present a comprehensive overview of the most widely used loss functions across key applications, including regression, classification, generative modeling, ranking, and energy-based modeling.
We introduce 43 distinct loss functions, structured within an intuitive taxonomy that clarifies their theoretical foundations, properties, and optimal application contexts.
arXiv Detail & Related papers (2023-01-13T14:38:24Z) - On Codomain Separability and Label Inference from (Noisy) Loss Functions [11.780563744330038]
We introduce the notion of codomain separability to study the necessary and sufficient conditions under which label inference is possible from any (noisy) loss function values.
We show that for many commonly used loss functions, including multiclass cross-entropy with common activation functions and some Bregman divergence-based losses, it is possible to design label inference attacks for arbitrary noise levels.
arXiv Detail & Related papers (2021-07-07T05:29:53Z) - Leveraging Unlabeled Data for Entity-Relation Extraction through
Probabilistic Constraint Satisfaction [54.06292969184476]
We study the problem of entity-relation extraction in the presence of symbolic domain knowledge.
Our approach employs semantic loss which captures the precise meaning of a logical sentence.
With a focus on low-data regimes, we show that semantic loss outperforms the baselines by a wide margin.
arXiv Detail & Related papers (2021-03-20T00:16:29Z) - Disambiguation of weak supervision with exponential convergence rates [88.99819200562784]
In supervised learning, data are annotated with incomplete yet discriminative information.
In this paper, we focus on partial labelling, an instance of weak supervision where, from a given input, we are given a set of potential targets.
We propose an empirical disambiguation algorithm to recover full supervision from weak supervision.
arXiv Detail & Related papers (2021-02-04T18:14:32Z) - Adversarial Training Reduces Information and Improves Transferability [81.59364510580738]
Recent results show that features of adversarially trained networks for classification, in addition to being robust, enable desirable properties such as invertibility.
We show that the Adversarial Training can improve linear transferability to new tasks, from which arises a new trade-off between transferability of representations and accuracy on the source task.
arXiv Detail & Related papers (2020-07-22T08:30:16Z) - Semantic Loss Application to Entity Relation Recognition [0.0]
This paper compares two general approaches for the entity relation recognition.
The main contribution of this paper is an end-to-end neural model for joint entity relation extraction.
arXiv Detail & Related papers (2020-06-07T03:12:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.