Related papers: Estimating the Entropy of Linguistic Distributions

Estimating the Entropy of Linguistic Distributions

URL: http://arxiv.org/abs/2204.01469v2
Date: Tue, 5 Apr 2022 03:46:10 GMT
Title: Estimating the Entropy of Linguistic Distributions
Authors: Aryaman Arora, Clara Meister, Ryan Cotterell
Abstract summary: We study the empirical effectiveness of different entropy estimators for linguistic distributions. We find evidence that the reported effect size is over-estimated due to over-reliance on poor entropy estimators.
Score: 75.20045001387685
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Shannon entropy is often a quantity of interest to linguists studying the communicative capacity of human language. However, entropy must typically be estimated from observed data because researchers do not have access to the underlying probability distribution that gives rise to these data. While entropy estimation is a well-studied problem in other fields, there is not yet a comprehensive exploration of the efficacy of entropy estimators for use with linguistic data. In this work, we fill this void, studying the empirical effectiveness of different entropy estimators for linguistic distributions. In a replication of two recent information-theoretic linguistic studies, we find evidence that the reported effect size is over-estimated due to over-reliance on poor entropy estimators. Finally, we end our paper with concrete recommendations for entropy estimation depending on distribution type and data availability.

Related papers

Entropy-Based Block Pruning for Efficient Large Language Models [81.18339597023187]
We propose an entropy-based pruning strategy to enhance efficiency while maintaining performance. Empirical analysis reveals that the entropy of hidden representations decreases in the early blocks but progressively increases across most subsequent blocks.
arXiv Detail & Related papers (2025-04-04T03:42:34Z)
To BEE or not to BEE: Estimating more than Entropy with Biased Entropy Estimators [0.3669506968635671]
We apply 18 widely employed entropy estimators to Shannon measures useful to the software engineer. We investigate how the estimators are affected by two main influential factors: sample size and domain size. Our most important result is identifying that the Chao-Shen and Chao-Wang-Jost estimators stand out for consistently converging more quickly to the ground truth.
arXiv Detail & Related papers (2025-01-20T10:48:08Z)
Measuring Sample Importance in Data Pruning for Language Models based on Information Entropy [4.079147243688765]
We consider a data pruning method based on information entropy. We propose that the samples in the training corpus be ranked in terms of their informativeness. Experiments reveal that the proposed information-based pruning can improve upon various language modeling and downstream tasks.
arXiv Detail & Related papers (2024-06-20T09:09:34Z)
InfoMatch: Entropy Neural Estimation for Semi-Supervised Image Classification [2.878018421751116]
We employ information entropy neural estimation to utilize the potential of unlabeled samples. Inspired by contrastive learning, the entropy is estimated by maximizing a lower bound on mutual information. We show our method's superior performance in extensive experiments.
arXiv Detail & Related papers (2024-04-17T02:29:44Z)
Estimating Unknown Population Sizes Using the Hypergeometric Distribution [1.03590082373586]
We tackle the challenge of estimating discrete distributions when both the total population size and the sizes of its constituent categories are unknown. We develop our approach to account for a data generating process where the ground-truth is a mixture of distributions conditional on a continuous latent variable. Empirical data simulation demonstrates that our method outperforms other likelihood functions used to model count data.
arXiv Detail & Related papers (2024-02-22T01:53:56Z)
Approximating Counterfactual Bounds while Fusing Observational, Biased and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies. We show that the likelihood of the available data has no local maxima. We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z)
Revisiting Entropy Rate Constancy in Text [43.928576088761844]
The uniform information density hypothesis states that humans tend to distribute information roughly evenly across an utterance or discourse. We re-evaluate the claims of Genzel & Charniak (2002) with neural language models, failing to find clear evidence in support of entropy rate constancy.
arXiv Detail & Related papers (2023-05-20T03:48:31Z)
Statistical Properties of the Entropy from Ordinal Patterns [55.551675080361335]
Knowing the joint distribution of the pair Entropy-Statistical Complexity for a large class of time series models would allow statistical tests that are unavailable to date. We characterize the distribution of the empirical Shannon's Entropy for any model under which the true normalized Entropy is neither zero nor one. We present a bilateral test that verifies if there is enough evidence to reject the hypothesis that two signals produce ordinal patterns with the same Shannon's Entropy.
arXiv Detail & Related papers (2022-09-15T23:55:58Z)
On the probability-quality paradox in language generation [76.69397802617064]
We analyze language generation through an information-theoretic lens. We posit that human-like language should contain an amount of information close to the entropy of the distribution over natural strings.
arXiv Detail & Related papers (2022-03-31T17:43:53Z)
Automatically Identifying Semantic Bias in Crowdsourced Natural Language Inference Datasets [78.6856732729301]
We introduce a model-driven, unsupervised technique to find "bias clusters" in a learned embedding space of hypotheses in NLI datasets. interventions and additional rounds of labeling can be performed to ameliorate the semantic bias of the hypothesis distribution of a dataset.
arXiv Detail & Related papers (2021-12-16T22:49:01Z)
Entropic Causal Inference: Identifiability and Finite Sample Results [14.495984877053948]
Entropic causal inference is a framework for inferring the causal direction between two categorical variables from observational data. We consider the minimum entropy coupling-based algorithmic approach presented by Kocaoglu et al.
arXiv Detail & Related papers (2021-01-10T08:37:54Z)
Neural Joint Entropy Estimation [12.77733789371855]
Estimating the entropy of a discrete random variable is a fundamental problem in information theory and related fields. In this work, we introduce a practical solution to this problem, which extends the work of McAllester and Statos ( 2020) The proposed scheme uses the generalization abilities of cross-entropy estimation in deep neural networks (DNNs) to introduce improved entropy estimation accuracy.
arXiv Detail & Related papers (2020-12-21T09:23:39Z)
Generalized Entropy Regularization or: There's Nothing Special about Label Smoothing [83.78668073898001]
We introduce a family of entropy regularizers, which includes label smoothing as a special case. We find that variance in model performance can be explained largely by the resulting entropy of the model. We advise the use of other entropy regularization methods in its place.
arXiv Detail & Related papers (2020-05-02T12:46:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.