TREET: TRansfer Entropy Estimation via Transformer
- URL: http://arxiv.org/abs/2402.06919v2
- Date: Wed, 21 Feb 2024 10:45:57 GMT
- Title: TREET: TRansfer Entropy Estimation via Transformer
- Authors: Omer Luxembourg, Dor Tsur, Haim Permuter
- Abstract summary: Transfer entropy (TE) is a measurement in information theory that reveals the directional flow of information between processes.
This work proposes Transfer Entropy Estimation via Transformers (TREET), a novel transformer-based approach for estimating the TE for stationary processes.
- Score: 1.1510009152620668
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transfer entropy (TE) is a measurement in information theory that reveals the
directional flow of information between processes, providing valuable insights
for a wide range of real-world applications. This work proposes Transfer
Entropy Estimation via Transformers (TREET), a novel transformer-based approach
for estimating the TE for stationary processes. The proposed approach employs
Donsker-Vardhan (DV) representation to TE and leverages the attention mechanism
for the task of neural estimation. We propose a detailed theoretical and
empirical study of the TREET, comparing it to existing methods. To increase its
applicability, we design an estimated TE optimization scheme that is motivated
by the functional representation lemma. Afterwards, we take advantage of the
joint optimization scheme to optimize the capacity of communication channels
with memory, which is a canonical optimization problem in information theory,
and show the memory capabilities of our estimator. Finally, we apply TREET to
real-world feature analysis. Our work, applied with state-of-the-art deep
learning methods, opens a new door for communication problems which are yet to
be solved.
Related papers
- A Unified Framework for Interpretable Transformers Using PDEs and Information Theory [3.4039202831583903]
This paper presents a novel unified theoretical framework for understanding Transformer architectures by integrating Partial Differential Equations (PDEs), Neural Information Flow Theory, and Information Bottleneck Theory.
We model Transformer information dynamics as a continuous PDE process, encompassing diffusion, self-attention, and nonlinear residual components.
Our comprehensive experiments across image and text modalities demonstrate that the PDE model effectively captures key aspects of Transformer behavior, achieving high similarity (cosine similarity > 0.98) with Transformer attention distributions across all layers.
arXiv Detail & Related papers (2024-08-18T16:16:57Z) - Disentangled Representation Learning with Transmitted Information Bottleneck [57.22757813140418]
We present textbfDisTIB (textbfTransmitted textbfInformation textbfBottleneck for textbfDisd representation learning), a novel objective that navigates the balance between information compression and preservation.
arXiv Detail & Related papers (2023-11-03T03:18:40Z) - TpopT: Efficient Trainable Template Optimization on Low-Dimensional
Manifolds [5.608047449631387]
A family of approaches, exemplified by template matching, aims to cover the search space with a dense template bank.
While simple and highly interpretable, it suffers from poor computational efficiency due to unfavorable scaling in the signal space dimensionality.
We study TpopT as an alternative scalable framework for detecting low-dimensional families of signals which maintains high interpretability.
arXiv Detail & Related papers (2023-10-16T03:51:13Z) - Large-Scale OD Matrix Estimation with A Deep Learning Method [70.78575952309023]
The proposed method integrates deep learning and numerical optimization algorithms to infer matrix structure and guide numerical optimization.
We conducted tests to demonstrate the good generalization performance of our method on a large-scale synthetic dataset.
arXiv Detail & Related papers (2023-10-09T14:30:06Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - Intelligent Model Update Strategy for Sequential Recommendation [34.02565495747133]
We introduce IntellectReq, abbreviated as IntellectReq. IntellectReq is designed to operate on edge, evaluating the cost-benefit landscape of parameter requests with minimal communication overhead.
We employ statistical mapping techniques to convert real-time user behavior into a normal distribution, thereby employing multi-sample outputs to quantify the model's uncertainty and thus its generalization capabilities.
arXiv Detail & Related papers (2023-02-14T20:44:12Z) - INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL [90.06845886194235]
We propose a modified objective for model-based reinforcement learning (RL)
We integrate a term inspired by variational empowerment into a state-space model based on mutual information.
We evaluate the approach on a suite of vision-based robot control tasks with natural video backgrounds.
arXiv Detail & Related papers (2022-04-18T23:09:23Z) - Sequential Information Design: Markov Persuasion Process and Its
Efficient Reinforcement Learning [156.5667417159582]
This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs)
Planning in MPPs faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender.
We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles.
arXiv Detail & Related papers (2022-02-22T05:41:43Z) - Transfer Learning with Gaussian Processes for Bayesian Optimization [9.933956770453438]
We provide a unified view on hierarchical GP models for transfer learning, which allows us to analyze the relationship between methods.
We develop a novel closed-form boosted GP transfer model that fits between existing approaches in terms of complexity.
We evaluate the performance of the different approaches in large-scale experiments and highlight strengths and weaknesses of the different transfer-learning methods.
arXiv Detail & Related papers (2021-11-22T14:09:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.