TREET: TRansfer Entropy Estimation via Transformers
- URL: http://arxiv.org/abs/2402.06919v4
- Date: Fri, 18 Jul 2025 19:07:07 GMT
- Title: TREET: TRansfer Entropy Estimation via Transformers
- Authors: Omer Luxembourg, Dor Tsur, Haim Permuter,
- Abstract summary: Transfer entropy (TE) is an information theoretic measure that reveals the directional flow of information between processes.<n>This work proposes Transfer Entropy Estimation via Transformers (TREET), a novel attention-based approach for estimating TE for stationary processes.
- Score: 1.024113475677323
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transfer entropy (TE) is an information theoretic measure that reveals the directional flow of information between processes, providing valuable insights for a wide range of real-world applications. This work proposes Transfer Entropy Estimation via Transformers (TREET), a novel attention-based approach for estimating TE for stationary processes. The proposed approach employs Donsker-Varadhan representation to TE and leverages the attention mechanism for the task of neural estimation. We propose a detailed theoretical and empirical study of the TREET, comparing it to existing methods on a dedicated estimation benchmark. To increase its applicability, we design an estimated TE optimization scheme that is motivated by the functional representation lemma, and use it to estimate the capacity of communication channels with memory, which is a canonical optimization problem in information theory. We further demonstrate how an optimized TREET can be used to estimate underlying densities, providing experimental results. Finally, we apply TREET to feature analysis of patients with Apnea, demonstrating its applicability to real-world physiological data. Our work, applied with state-of-the-art deep learning methods, opens a new door for communication problems which are yet to be solved.
Related papers
- Optimal Transport Adapter Tuning for Bridging Modality Gaps in Few-Shot Remote Sensing Scene Classification [80.83325513157637]
Few-Shot Remote Sensing Scene Classification (FS-RSSC) presents the challenge of classifying remote sensing images with limited labeled samples.
We propose a novel Optimal Transport Adapter Tuning (OTAT) framework aimed at constructing an ideal Platonic representational space.
arXiv Detail & Related papers (2025-03-19T07:04:24Z) - End-to-End Optimal Detector Design with Mutual Information Surrogates [1.7042756021131187]
We introduce a novel approach for end-to-end black-box optimization of high energy physics detectors using local deep learning (DL) surrogates.
In addition to a standard reconstruction-based metric commonly used in the field, we investigate the information-theoretic metric of mutual information.
Our findings reveal three key insights: (1) end-toend black-box optimization using local surrogates is a practical and compelling approach for detector design; (2) mutual information-based optimization yields design choices that closely match those from state-of-the-art physics-informed methods; and (3) information-theoretic methods provide a
arXiv Detail & Related papers (2025-03-18T15:23:03Z) - A Survey on Inference Optimization Techniques for Mixture of Experts Models [50.40325411764262]
Large-scale Mixture of Experts (MoE) models offer enhanced model capacity and computational efficiency through conditional computation.
deploying and running inference on these models presents significant challenges in computational resources, latency, and energy efficiency.
This survey analyzes optimization techniques for MoE models across the entire system stack.
arXiv Detail & Related papers (2024-12-18T14:11:15Z) - Causal Representation Learning with Generative Artificial Intelligence: Application to Texts as Treatments [0.0]
We show how to enhance the validity of causal inference with unstructured high-dimensional treatments like texts.<n>We propose to use a deep generative model such as large language models (LLMs) to efficiently generate treatments.<n>We show that the knowledge of this true internal representation helps disentangle the treatment features of interest.
arXiv Detail & Related papers (2024-10-01T17:46:21Z) - A Unified Framework for Interpretable Transformers Using PDEs and Information Theory [3.4039202831583903]
This paper presents a novel unified theoretical framework for understanding Transformer architectures by integrating Partial Differential Equations (PDEs), Neural Information Flow Theory, and Information Bottleneck Theory.
We model Transformer information dynamics as a continuous PDE process, encompassing diffusion, self-attention, and nonlinear residual components.
Our comprehensive experiments across image and text modalities demonstrate that the PDE model effectively captures key aspects of Transformer behavior, achieving high similarity (cosine similarity > 0.98) with Transformer attention distributions across all layers.
arXiv Detail & Related papers (2024-08-18T16:16:57Z) - REMEDI: Corrective Transformations for Improved Neural Entropy Estimation [0.7488108981865708]
We introduce $textttREMEDI$ for efficient and accurate estimation of differential entropy.
Our approach demonstrates improvement across a broad spectrum of estimation tasks.
It can be naturally extended to information theoretic supervised learning models.
arXiv Detail & Related papers (2024-02-08T14:47:37Z) - Disentangled Representation Learning with Transmitted Information Bottleneck [57.22757813140418]
We present textbfDisTIB (textbfTransmitted textbfInformation textbfBottleneck for textbfDisd representation learning), a novel objective that navigates the balance between information compression and preservation.
arXiv Detail & Related papers (2023-11-03T03:18:40Z) - TpopT: Efficient Trainable Template Optimization on Low-Dimensional
Manifolds [5.608047449631387]
A family of approaches, exemplified by template matching, aims to cover the search space with a dense template bank.
While simple and highly interpretable, it suffers from poor computational efficiency due to unfavorable scaling in the signal space dimensionality.
We study TpopT as an alternative scalable framework for detecting low-dimensional families of signals which maintains high interpretability.
arXiv Detail & Related papers (2023-10-16T03:51:13Z) - Large-Scale OD Matrix Estimation with A Deep Learning Method [70.78575952309023]
The proposed method integrates deep learning and numerical optimization algorithms to infer matrix structure and guide numerical optimization.
We conducted tests to demonstrate the good generalization performance of our method on a large-scale synthetic dataset.
arXiv Detail & Related papers (2023-10-09T14:30:06Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - Intelligent Model Update Strategy for Sequential Recommendation [34.02565495747133]
We introduce IntellectReq, abbreviated as IntellectReq. IntellectReq is designed to operate on edge, evaluating the cost-benefit landscape of parameter requests with minimal communication overhead.
We employ statistical mapping techniques to convert real-time user behavior into a normal distribution, thereby employing multi-sample outputs to quantify the model's uncertainty and thus its generalization capabilities.
arXiv Detail & Related papers (2023-02-14T20:44:12Z) - STEERING: Stein Information Directed Exploration for Model-Based
Reinforcement Learning [111.75423966239092]
We propose an exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal.
Based on KSD, we develop a novel algorithm algo: textbfSTEin information dirtextbfEcted exploration for model-based textbfReinforcement LearntextbfING.
arXiv Detail & Related papers (2023-01-28T00:49:28Z) - Validation Diagnostics for SBI algorithms based on Normalizing Flows [55.41644538483948]
This work proposes easy to interpret validation diagnostics for multi-dimensional conditional (posterior) density estimators based on NF.
It also offers theoretical guarantees based on results of local consistency.
This work should help the design of better specified models or drive the development of novel SBI-algorithms.
arXiv Detail & Related papers (2022-11-17T15:48:06Z) - MACE: An Efficient Model-Agnostic Framework for Counterfactual
Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE)
In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity.
Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z) - INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL [90.06845886194235]
We propose a modified objective for model-based reinforcement learning (RL)
We integrate a term inspired by variational empowerment into a state-space model based on mutual information.
We evaluate the approach on a suite of vision-based robot control tasks with natural video backgrounds.
arXiv Detail & Related papers (2022-04-18T23:09:23Z) - Neural Enhanced Belief Propagation for Data Assocation in Multiobject
Tracking [8.228150100178983]
Multiobject tracking (MOT) will create new services and applications in fields such as autonomous navigation and applied ocean sciences.
Belief propagation (BP) is a state-of-the-art method for Bayesian MOT but fully relies on a statistical model and preprocessed sensor measurements.
We establish a hybrid method for model-based and data-driven MOT. The proposed neural enhanced belief propagation (NEBP) approach complements BP by information learned from raw sensor data.
We evaluate the performance of our NEBP approach for MOT on the nuScenes autonomous driving dataset and demonstrate that it can outperform state-of-
arXiv Detail & Related papers (2022-03-17T00:12:48Z) - Information-Theoretic Odometry Learning [83.36195426897768]
We propose a unified information theoretic framework for learning-motivated methods aimed at odometry estimation.
The proposed framework provides an elegant tool for performance evaluation and understanding in information-theoretic language.
arXiv Detail & Related papers (2022-03-11T02:37:35Z) - Sequential Information Design: Markov Persuasion Process and Its
Efficient Reinforcement Learning [156.5667417159582]
This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs)
Planning in MPPs faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender.
We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles.
arXiv Detail & Related papers (2022-02-22T05:41:43Z) - Data Augmentation through Expert-guided Symmetry Detection to Improve
Performance in Offline Reinforcement Learning [0.0]
offline estimation of the dynamical model of a Markov Decision Process (MDP) is a non-trivial task.
Recent works showed that an expert-guided pipeline relying on Density Estimation methods effectively detects this structure in deterministic environments.
We show that the former results lead to a performance improvement when solving the learned MDP and then applying the optimized policy in the real environment.
arXiv Detail & Related papers (2021-12-18T14:32:32Z) - Transfer Learning with Gaussian Processes for Bayesian Optimization [9.933956770453438]
We provide a unified view on hierarchical GP models for transfer learning, which allows us to analyze the relationship between methods.
We develop a novel closed-form boosted GP transfer model that fits between existing approaches in terms of complexity.
We evaluate the performance of the different approaches in large-scale experiments and highlight strengths and weaknesses of the different transfer-learning methods.
arXiv Detail & Related papers (2021-11-22T14:09:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.