Neural Joint Entropy Estimation
- URL: http://arxiv.org/abs/2012.11197v1
- Date: Mon, 21 Dec 2020 09:23:39 GMT
- Title: Neural Joint Entropy Estimation
- Authors: Yuval Shalev, Amichai Painsky, Irad Ben-Gal
- Abstract summary: Estimating the entropy of a discrete random variable is a fundamental problem in information theory and related fields.
In this work, we introduce a practical solution to this problem, which extends the work of McAllester and Statos ( 2020)
The proposed scheme uses the generalization abilities of cross-entropy estimation in deep neural networks (DNNs) to introduce improved entropy estimation accuracy.
- Score: 12.77733789371855
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Estimating the entropy of a discrete random variable is a fundamental problem
in information theory and related fields. This problem has many applications in
various domains, including machine learning, statistics and data compression.
Over the years, a variety of estimation schemes have been suggested. However,
despite significant progress, most methods still struggle when the sample is
small, compared to the variable's alphabet size. In this work, we introduce a
practical solution to this problem, which extends the work of McAllester and
Statos (2020). The proposed scheme uses the generalization abilities of
cross-entropy estimation in deep neural networks (DNNs) to introduce improved
entropy estimation accuracy. Furthermore, we introduce a family of estimators
for related information-theoretic measures, such as conditional entropy and
mutual information. We show that these estimators are strongly consistent and
demonstrate their performance in a variety of use-cases. First, we consider
large alphabet entropy estimation. Then, we extend the scope to mutual
information estimation. Next, we apply the proposed scheme to conditional
mutual information estimation, as we focus on independence testing tasks.
Finally, we study a transfer entropy estimation problem. The proposed
estimators demonstrate improved performance compared to existing methods in all
tested setups.
Related papers
- Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods [59.779795063072655]
Chain-of-Thought (CoT) prompting and its variants have gained popularity as effective methods for solving multi-step reasoning problems.
We analyze CoT prompting from a statistical estimation perspective, providing a comprehensive characterization of its sample complexity.
arXiv Detail & Related papers (2024-08-25T04:07:18Z) - REMEDI: Corrective Transformations for Improved Neural Entropy Estimation [0.7488108981865708]
We introduce $textttREMEDI$ for efficient and accurate estimation of differential entropy.
Our approach demonstrates improvement across a broad spectrum of estimation tasks.
It can be naturally extended to information theoretic supervised learning models.
arXiv Detail & Related papers (2024-02-08T14:47:37Z) - Information Theory Inspired Pattern Analysis for Time-series Data [60.86880787242563]
We propose a highly generalizable method that uses information theory-based features to identify and learn from patterns in time-series data.
For applications with state transitions, features are developed based on Shannon's entropy of Markov chains, entropy rates of Markov chains, and von Neumann entropy of Markov chains.
The results show the proposed information theory-based features improve the recall rate, F1 score, and accuracy on average by up to 23.01% compared with the baseline models.
arXiv Detail & Related papers (2023-02-22T21:09:35Z) - Mutual Wasserstein Discrepancy Minimization for Sequential
Recommendation [82.0801585843835]
We propose a novel self-supervised learning framework based on Mutual WasserStein discrepancy minimization MStein for the sequential recommendation.
We also propose a novel contrastive learning loss based on Wasserstein Discrepancy Measurement.
arXiv Detail & Related papers (2023-01-28T13:38:48Z) - An Application of a Multivariate Estimation of Distribution Algorithm to
Cancer Chemotherapy [59.40521061783166]
Chemotherapy treatment for cancer is a complex optimisation problem with a large number of interacting variables and constraints.
We show that the more sophisticated algorithm would yield better performance on a complex problem like this.
We hypothesise that this is caused by the more sophisticated algorithm being impeded by the large number of interactions in the problem.
arXiv Detail & Related papers (2022-05-17T15:28:46Z) - Estimating the Entropy of Linguistic Distributions [75.20045001387685]
We study the empirical effectiveness of different entropy estimators for linguistic distributions.
We find evidence that the reported effect size is over-estimated due to over-reliance on poor entropy estimators.
arXiv Detail & Related papers (2022-04-04T13:36:46Z) - A Unifying Framework for Some Directed Distances in Statistics [0.0]
Density-based directed distances -- particularly known as divergences -- are widely used in statistics.
We provide a general framework which covers in particular both the density-based and distribution-function-based divergence approaches.
We deduce new concepts of dependence between random variables, as alternatives to the celebrated mutual information.
arXiv Detail & Related papers (2022-03-02T04:24:13Z) - Nonlinear Distribution Regression for Remote Sensing Applications [6.664736150040092]
In many remote sensing applications one wants to estimate variables or parameters of interest from observations.
Standard algorithms such as neural networks, random forests or Gaussian processes are readily available to relate to the two.
This paper introduces a nonlinear (kernel-based) method for distribution regression that solves the previous problems without making any assumption on the statistics of the grouped data.
arXiv Detail & Related papers (2020-12-07T22:04:43Z) - High-Dimensional Multi-Task Averaging and Application to Kernel Mean
Embedding [0.0]
We propose an improved estimator for the multi-task averaging problem.
We prove theoretically that this approach provides a reduction in mean squared error.
An application of this approach is the estimation of multiple kernel mean embeddings.
arXiv Detail & Related papers (2020-11-13T07:31:30Z) - Information Theory Measures via Multidimensional Gaussianization [7.788961560607993]
Information theory is an outstanding framework to measure uncertainty, dependence and relevance in data and systems.
It has several desirable properties for real world applications.
However, obtaining information from multidimensional data is a challenging problem due to the curse of dimensionality.
arXiv Detail & Related papers (2020-10-08T07:22:16Z) - Neural Methods for Point-wise Dependency Estimation [129.93860669802046]
We focus on estimating point-wise dependency (PD), which quantitatively measures how likely two outcomes co-occur.
We demonstrate the effectiveness of our approaches in 1) MI estimation, 2) self-supervised representation learning, and 3) cross-modal retrieval task.
arXiv Detail & Related papers (2020-06-09T23:26:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.