Estimating the Entropy of Linguistic Distributions
- URL: http://arxiv.org/abs/2204.01469v2
- Date: Tue, 5 Apr 2022 03:46:10 GMT
- Title: Estimating the Entropy of Linguistic Distributions
- Authors: Aryaman Arora, Clara Meister, Ryan Cotterell
- Abstract summary: We study the empirical effectiveness of different entropy estimators for linguistic distributions.
We find evidence that the reported effect size is over-estimated due to over-reliance on poor entropy estimators.
- Score: 75.20045001387685
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Shannon entropy is often a quantity of interest to linguists studying the
communicative capacity of human language. However, entropy must typically be
estimated from observed data because researchers do not have access to the
underlying probability distribution that gives rise to these data. While
entropy estimation is a well-studied problem in other fields, there is not yet
a comprehensive exploration of the efficacy of entropy estimators for use with
linguistic data. In this work, we fill this void, studying the empirical
effectiveness of different entropy estimators for linguistic distributions. In
a replication of two recent information-theoretic linguistic studies, we find
evidence that the reported effect size is over-estimated due to over-reliance
on poor entropy estimators. Finally, we end our paper with concrete
recommendations for entropy estimation depending on distribution type and data
availability.
Related papers
- To BEE or not to BEE: Estimating more than Entropy with Biased Entropy Estimators [0.3669506968635671]
We apply 18 widely employed entropy estimators to Shannon measures useful to the software engineer.
We investigate how the estimators are affected by two main influential factors: sample size and domain size.
Our most important result is identifying that the Chao-Shen and Chao-Wang-Jost estimators stand out for consistently converging more quickly to the ground truth.
arXiv Detail & Related papers (2025-01-20T10:48:08Z) - Measuring Sample Importance in Data Pruning for Language Models based on Information Entropy [4.079147243688765]
We consider a data pruning method based on information entropy.
We propose that the samples in the training corpus be ranked in terms of their informativeness.
Experiments reveal that the proposed information-based pruning can improve upon various language modeling and downstream tasks.
arXiv Detail & Related papers (2024-06-20T09:09:34Z) - Estimating Unknown Population Sizes Using the Hypergeometric Distribution [1.03590082373586]
We tackle the challenge of estimating discrete distributions when both the total population size and the sizes of its constituent categories are unknown.
We develop our approach to account for a data generating process where the ground-truth is a mixture of distributions conditional on a continuous latent variable.
Empirical data simulation demonstrates that our method outperforms other likelihood functions used to model count data.
arXiv Detail & Related papers (2024-02-22T01:53:56Z) - Approximating Counterfactual Bounds while Fusing Observational, Biased
and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies.
We show that the likelihood of the available data has no local maxima.
We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z) - Revisiting Entropy Rate Constancy in Text [43.928576088761844]
The uniform information density hypothesis states that humans tend to distribute information roughly evenly across an utterance or discourse.
We re-evaluate the claims of Genzel & Charniak (2002) with neural language models, failing to find clear evidence in support of entropy rate constancy.
arXiv Detail & Related papers (2023-05-20T03:48:31Z) - Statistical Properties of the Entropy from Ordinal Patterns [55.551675080361335]
Knowing the joint distribution of the pair Entropy-Statistical Complexity for a large class of time series models would allow statistical tests that are unavailable to date.
We characterize the distribution of the empirical Shannon's Entropy for any model under which the true normalized Entropy is neither zero nor one.
We present a bilateral test that verifies if there is enough evidence to reject the hypothesis that two signals produce ordinal patterns with the same Shannon's Entropy.
arXiv Detail & Related papers (2022-09-15T23:55:58Z) - On the probability-quality paradox in language generation [76.69397802617064]
We analyze language generation through an information-theoretic lens.
We posit that human-like language should contain an amount of information close to the entropy of the distribution over natural strings.
arXiv Detail & Related papers (2022-03-31T17:43:53Z) - Automatically Identifying Semantic Bias in Crowdsourced Natural Language
Inference Datasets [78.6856732729301]
We introduce a model-driven, unsupervised technique to find "bias clusters" in a learned embedding space of hypotheses in NLI datasets.
interventions and additional rounds of labeling can be performed to ameliorate the semantic bias of the hypothesis distribution of a dataset.
arXiv Detail & Related papers (2021-12-16T22:49:01Z) - Neural Joint Entropy Estimation [12.77733789371855]
Estimating the entropy of a discrete random variable is a fundamental problem in information theory and related fields.
In this work, we introduce a practical solution to this problem, which extends the work of McAllester and Statos ( 2020)
The proposed scheme uses the generalization abilities of cross-entropy estimation in deep neural networks (DNNs) to introduce improved entropy estimation accuracy.
arXiv Detail & Related papers (2020-12-21T09:23:39Z) - Generalized Entropy Regularization or: There's Nothing Special about
Label Smoothing [83.78668073898001]
We introduce a family of entropy regularizers, which includes label smoothing as a special case.
We find that variance in model performance can be explained largely by the resulting entropy of the model.
We advise the use of other entropy regularization methods in its place.
arXiv Detail & Related papers (2020-05-02T12:46:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.