Related papers: Understanding Encoder-Decoder Structures in Machine Learning Using Information Measures

Understanding Encoder-Decoder Structures in Machine Learning Using Information Measures

URL: http://arxiv.org/abs/2405.20452v1
Date: Thu, 30 May 2024 19:58:01 GMT
Title: Understanding Encoder-Decoder Structures in Machine Learning Using Information Measures
Authors: Jorge F. Silva, Victor Faraggi, Camilo Ramirez, Alvaro Egana, Eduardo Pavez,
Abstract summary: We present new results to model and understand the role of encoder-decoder design in machine learning (ML) We use two main information concepts, information sufficiency (IS) and mutual information loss (MIL), to represent predictive structures in machine learning.
Score: 10.066310107046084
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We present new results to model and understand the role of encoder-decoder design in machine learning (ML) from an information-theoretic angle. We use two main information concepts, information sufficiency (IS) and mutual information loss (MIL), to represent predictive structures in machine learning. Our first main result provides a functional expression that characterizes the class of probabilistic models consistent with an IS encoder-decoder latent predictive structure. This result formally justifies the encoder-decoder forward stages many modern ML architectures adopt to learn latent (compressed) representations for classification. To illustrate IS as a realistic and relevant model assumption, we revisit some known ML concepts and present some interesting new examples: invariant, robust, sparse, and digital models. Furthermore, our IS characterization allows us to tackle the fundamental question of how much performance (predictive expressiveness) could be lost, using the cross entropy risk, when a given encoder-decoder architecture is adopted in a learning setting. Here, our second main result shows that a mutual information loss quantifies the lack of expressiveness attributed to the choice of a (biased) encoder-decoder ML design. Finally, we address the problem of universal cross-entropy learning with an encoder-decoder design where necessary and sufficiency conditions are established to meet this requirement. In all these results, Shannon's information measures offer new interpretations and explanations for representation learning.

Related papers

PAGE: Parametric Generative Explainer for Graph Neural Network [16.350208494261913]
PAGE is capable of providing faithful explanations for any graph neural network without necessitating prior knowledge or internal details. We introduce an additional discriminator to capture the causality between latent causal features and the model's output. Compared to existing methods, PAGE operates at the sample scale rather than nodes or edges.
arXiv Detail & Related papers (2024-08-26T06:39:49Z)
Leveraging Knowlegde Graphs for Interpretable Feature Generation [0.0]
KRAFT is an AutoFE framework that leverages a knowledge graph to guide the generation of interpretable features. Our hybrid AI approach combines a neural generator to transform raw features through a series of transformations and a knowledge-based reasoner to evaluate features interpretability. The generator is trained through Deep Reinforcement Learning (DRL) to maximize the prediction accuracy and the interpretability of the generated features.
arXiv Detail & Related papers (2024-06-01T19:51:29Z)
Bidirectional Trained Tree-Structured Decoder for Handwritten Mathematical Expression Recognition [51.66383337087724]
The Handwritten Mathematical Expression Recognition (HMER) task is a critical branch in the field of OCR. Recent studies have demonstrated that incorporating bidirectional context information significantly improves the performance of HMER models. We propose the Mirror-Flipped Symbol Layout Tree (MF-SLT) and Bidirectional Asynchronous Training (BAT) structure.
arXiv Detail & Related papers (2023-12-31T09:24:21Z)
Low-Resolution Self-Attention for Semantic Segmentation [96.81482872022237]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost. Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution. We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z)
Dynamic Encoding and Decoding of Information for Split Learning in Mobile-Edge Computing: Leveraging Information Bottleneck Theory [1.1151919978983582]
Split learning is a privacy-preserving distributed learning paradigm in which an ML model is split into two parts (i.e., an encoder and a decoder) In mobile-edge computing, network functions can be trained via split learning where an encoder resides in a user equipment (UE) and a decoder resides in the edge network. We present a new framework and training mechanism to enable a dynamic balancing of the transmission resource consumption with the informativeness of the shared latent representations.
arXiv Detail & Related papers (2023-09-06T07:04:37Z)
Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space. We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z)
Toward a Geometrical Understanding of Self-supervised Contrastive Learning [55.83778629498769]
Self-supervised learning (SSL) is one of the premier techniques to create data representations that are actionable for transfer learning in the absence of human annotations. Mainstream SSL techniques rely on a specific deep neural network architecture with two cascaded neural networks: the encoder and the projector. In this paper, we investigate how the strength of the data augmentation policies affects the data embedding.
arXiv Detail & Related papers (2022-05-13T23:24:48Z)
Great Truths are Always Simple: A Rather Simple Knowledge Encoder for Enhancing the Commonsense Reasoning Capacity of Pre-Trained Models [89.98762327725112]
Commonsense reasoning in natural language is a desired ability of artificial intelligent systems. For solving complex commonsense reasoning tasks, a typical solution is to enhance pre-trained language models(PTMs) with a knowledge-aware graph neural network(GNN) encoder. Despite the effectiveness, these approaches are built on heavy architectures, and can't clearly explain how external knowledge resources improve the reasoning capacity of PTMs.
arXiv Detail & Related papers (2022-05-04T01:27:36Z)
Information-Theoretic Odometry Learning [83.36195426897768]
We propose a unified information theoretic framework for learning-motivated methods aimed at odometry estimation. The proposed framework provides an elegant tool for performance evaluation and understanding in information-theoretic language.
arXiv Detail & Related papers (2022-03-11T02:37:35Z)
Probabilistic Autoencoder using Fisher Information [0.0]
In this work, an extension to the Autoencoder architecture is introduced, the FisherNet. In this architecture, the latent space uncertainty is not generated using an additional information channel in the encoder, but derived from the decoder, by means of the Fisher information metric. We can show experimentally that the FisherNet produces more accurate data reconstructions than a comparable VAE and its learning performance also apparently scales better with the number of latent space dimensions.
arXiv Detail & Related papers (2021-10-28T08:33:24Z)
A New Modal Autoencoder for Functionally Independent Feature Extraction [6.690183908967779]
A new modal autoencoder (MAE) is proposed by othogonalising the columns of the readout weight matrix. The results were validated on the MNIST variations and USPS classification benchmark suite. The new MAE introduces a very simple training principle for autoencoders and could be promising for the pre-training of deep neural networks.
arXiv Detail & Related papers (2020-06-25T13:25:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.