Related papers: Neural Joint Entropy Estimation

Neural Joint Entropy Estimation

URL: http://arxiv.org/abs/2012.11197v1
Date: Mon, 21 Dec 2020 09:23:39 GMT
Title: Neural Joint Entropy Estimation
Authors: Yuval Shalev, Amichai Painsky, Irad Ben-Gal
Abstract summary: Estimating the entropy of a discrete random variable is a fundamental problem in information theory and related fields. In this work, we introduce a practical solution to this problem, which extends the work of McAllester and Statos ( 2020) The proposed scheme uses the generalization abilities of cross-entropy estimation in deep neural networks (DNNs) to introduce improved entropy estimation accuracy.
Score: 12.77733789371855
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Estimating the entropy of a discrete random variable is a fundamental problem in information theory and related fields. This problem has many applications in various domains, including machine learning, statistics and data compression. Over the years, a variety of estimation schemes have been suggested. However, despite significant progress, most methods still struggle when the sample is small, compared to the variable's alphabet size. In this work, we introduce a practical solution to this problem, which extends the work of McAllester and Statos (2020). The proposed scheme uses the generalization abilities of cross-entropy estimation in deep neural networks (DNNs) to introduce improved entropy estimation accuracy. Furthermore, we introduce a family of estimators for related information-theoretic measures, such as conditional entropy and mutual information. We show that these estimators are strongly consistent and demonstrate their performance in a variety of use-cases. First, we consider large alphabet entropy estimation. Then, we extend the scope to mutual information estimation. Next, we apply the proposed scheme to conditional mutual information estimation, as we focus on independence testing tasks. Finally, we study a transfer entropy estimation problem. The proposed estimators demonstrate improved performance compared to existing methods in all tested setups.

Related papers

In-Context Parametric Inference: Point or Distribution Estimators? [66.22308335324239]
We show that amortized point estimators generally outperform posterior inference, though the latter remain competitive in some low-dimensional problems. Our experiments indicate that amortized point estimators generally outperform posterior inference, though the latter remain competitive in some low-dimensional problems.
arXiv Detail & Related papers (2025-02-17T10:00:24Z)
To BEE or not to BEE: Estimating more than Entropy with Biased Entropy Estimators [0.3669506968635671]
We apply 18 widely employed entropy estimators to Shannon measures useful to the software engineer. We investigate how the estimators are affected by two main influential factors: sample size and domain size. Our most important result is identifying that the Chao-Shen and Chao-Wang-Jost estimators stand out for consistently converging more quickly to the ground truth.
arXiv Detail & Related papers (2025-01-20T10:48:08Z)
Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods [59.779795063072655]
Chain-of-Thought (CoT) prompting and its variants have gained popularity as effective methods for solving multi-step reasoning problems. We analyze CoT prompting from a statistical estimation perspective, providing a comprehensive characterization of its sample complexity.
arXiv Detail & Related papers (2024-08-25T04:07:18Z)
REMEDI: Corrective Transformations for Improved Neural Entropy Estimation [0.7488108981865708]
We introduce $textttREMEDI$ for efficient and accurate estimation of differential entropy. Our approach demonstrates improvement across a broad spectrum of estimation tasks. It can be naturally extended to information theoretic supervised learning models.
arXiv Detail & Related papers (2024-02-08T14:47:37Z)
Information Theory Inspired Pattern Analysis for Time-series Data [60.86880787242563]
We propose a highly generalizable method that uses information theory-based features to identify and learn from patterns in time-series data. For applications with state transitions, features are developed based on Shannon's entropy of Markov chains, entropy rates of Markov chains, and von Neumann entropy of Markov chains. The results show the proposed information theory-based features improve the recall rate, F1 score, and accuracy on average by up to 23.01% compared with the baseline models.
arXiv Detail & Related papers (2023-02-22T21:09:35Z)
Mutual Wasserstein Discrepancy Minimization for Sequential Recommendation [82.0801585843835]
We propose a novel self-supervised learning framework based on Mutual WasserStein discrepancy minimization MStein for the sequential recommendation. We also propose a novel contrastive learning loss based on Wasserstein Discrepancy Measurement.
arXiv Detail & Related papers (2023-01-28T13:38:48Z)
An Application of a Multivariate Estimation of Distribution Algorithm to Cancer Chemotherapy [59.40521061783166]
Chemotherapy treatment for cancer is a complex optimisation problem with a large number of interacting variables and constraints. We show that the more sophisticated algorithm would yield better performance on a complex problem like this. We hypothesise that this is caused by the more sophisticated algorithm being impeded by the large number of interactions in the problem.
arXiv Detail & Related papers (2022-05-17T15:28:46Z)
Estimating the Entropy of Linguistic Distributions [75.20045001387685]
We study the empirical effectiveness of different entropy estimators for linguistic distributions. We find evidence that the reported effect size is over-estimated due to over-reliance on poor entropy estimators.
arXiv Detail & Related papers (2022-04-04T13:36:46Z)
A Unifying Framework for Some Directed Distances in Statistics [0.0]
Density-based directed distances -- particularly known as divergences -- are widely used in statistics. We provide a general framework which covers in particular both the density-based and distribution-function-based divergence approaches. We deduce new concepts of dependence between random variables, as alternatives to the celebrated mutual information.
arXiv Detail & Related papers (2022-03-02T04:24:13Z)
Nonlinear Distribution Regression for Remote Sensing Applications [6.664736150040092]
In many remote sensing applications one wants to estimate variables or parameters of interest from observations. Standard algorithms such as neural networks, random forests or Gaussian processes are readily available to relate to the two. This paper introduces a nonlinear (kernel-based) method for distribution regression that solves the previous problems without making any assumption on the statistics of the grouped data.
arXiv Detail & Related papers (2020-12-07T22:04:43Z)
High-Dimensional Multi-Task Averaging and Application to Kernel Mean Embedding [0.0]
We propose an improved estimator for the multi-task averaging problem. We prove theoretically that this approach provides a reduction in mean squared error. An application of this approach is the estimation of multiple kernel mean embeddings.
arXiv Detail & Related papers (2020-11-13T07:31:30Z)
Information Theory Measures via Multidimensional Gaussianization [7.788961560607993]
Information theory is an outstanding framework to measure uncertainty, dependence and relevance in data and systems. It has several desirable properties for real world applications. However, obtaining information from multidimensional data is a challenging problem due to the curse of dimensionality.
arXiv Detail & Related papers (2020-10-08T07:22:16Z)
Neural Methods for Point-wise Dependency Estimation [129.93860669802046]
We focus on estimating point-wise dependency (PD), which quantitatively measures how likely two outcomes co-occur. We demonstrate the effectiveness of our approaches in 1) MI estimation, 2) self-supervised representation learning, and 3) cross-modal retrieval task.
arXiv Detail & Related papers (2020-06-09T23:26:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.