Related papers: How Well Does First-Token Entropy Approximate Word Entropy as a Psycholinguistic Predictor?

How Well Does First-Token Entropy Approximate Word Entropy as a Psycholinguistic Predictor?

URL: http://arxiv.org/abs/2507.22209v1
Date: Tue, 29 Jul 2025 20:12:50 GMT
Title: How Well Does First-Token Entropy Approximate Word Entropy as a Psycholinguistic Predictor?
Authors: Christian Clark, Byung-Doh Oh, William Schuler,
Abstract summary: Contextual entropy is a psycholinguistic measure capturing the anticipated difficulty of processing a word.<n>For convenience, entropy is typically estimated based on a language model's probability distribution over a word's first subword token.<n>We generate Monte Carlo estimates of word entropy that allow words to span a variable number of tokens.
Score: 16.55240473621401
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Contextual entropy is a psycholinguistic measure capturing the anticipated difficulty of processing a word just before it is encountered. Recent studies have tested for entropy-related effects as a potential complement to well-known effects from surprisal. For convenience, entropy is typically estimated based on a language model's probability distribution over a word's first subword token. However, this approximation results in underestimation and potential distortion of true word entropy. To address this, we generate Monte Carlo (MC) estimates of word entropy that allow words to span a variable number of tokens. Regression experiments on reading times show divergent results between first-token and MC word entropy, suggesting a need for caution in using first-token approximations of contextual entropy.

Related papers

Entropy-Based Block Pruning for Efficient Large Language Models [81.18339597023187]
We propose an entropy-based pruning strategy to enhance efficiency while maintaining performance.<n> Empirical analysis reveals that the entropy of hidden representations decreases in the early blocks but progressively increases across most subsequent blocks.
arXiv Detail & Related papers (2025-04-04T03:42:34Z)
Unification of observational entropy with maximum entropy principles [2.9127054707887967]
We introduce a definition of coarse-grained entropy that unifies measurement-based (observational entropy) and max-entropy-based (Jaynes) approaches to coarse-graining.<n>We study the dynamics of this entropy in a quantum random matrix model and a classical hard sphere gas.
arXiv Detail & Related papers (2025-03-19T18:00:30Z)
Expectation Entropy as a Password Strength Metric [1.4732811715354452]
Expectation entropy can be applied to estimate the strength of any random or random-like password. Having an 'Expectation entropy' of a certain value, for example, 0.4 means that an attacker has to exhaustively search at least 40% of the total number of guesses to find the password.
arXiv Detail & Related papers (2024-03-18T15:03:37Z)
Entropy Production from Maximum Entropy Principle: a Unifying Approach [0.0]
Entropy production is the crucial quantity characterizing irreversible phenomena and the second law of thermodynamics. We use Jaynes' maximum entropy principle to establish a framework that brings together prominent and apparently conflicting definitions.
arXiv Detail & Related papers (2024-01-18T12:32:45Z)
Testing the Quantum of Entropy [0.0]
It is clarified when it is possible to speak about a quantum of entropy, given by the Boltzmann constant k, and about a lower entropy limit $S geq k ln 2$.
arXiv Detail & Related papers (2023-07-19T11:34:54Z)
Multiperiodic Processes: Ergodic Sources with a Sublinear Entropy [0.0]
Multiperiodic processes are supported on randomly shifted deterministic sequences called multiperiodic sequences.<n>Exactly in the same setting, the respective multiperiodic processes satisfy an power-law growth of block entropy, called Hilberg's law.
arXiv Detail & Related papers (2023-02-17T18:27:27Z)
On the Effect of Anticipation on Reading Times [84.27103313675342]
We operationalize anticipation as a word's contextual entropy. We find substantial evidence for effects of contextual entropy over surprisal on a word's reading time.
arXiv Detail & Related papers (2022-11-25T18:58:23Z)
Estimating the Entropy of Linguistic Distributions [75.20045001387685]
We study the empirical effectiveness of different entropy estimators for linguistic distributions. We find evidence that the reported effect size is over-estimated due to over-reliance on poor entropy estimators.
arXiv Detail & Related papers (2022-04-04T13:36:46Z)
Aspects of Pseudo Entropy in Field Theories [0.0]
We numerically analyze a class of free scalar field theories and the XY spin model. This reveals the basic properties of pseudo entropy in many-body systems. We find that the non-positivity of the difference can be violated only if the initial and final states belong to different quantum phases.
arXiv Detail & Related papers (2021-06-06T13:25:35Z)
Action Redundancy in Reinforcement Learning [54.291331971813364]
We show that transition entropy can be described by two terms; namely, model-dependent transition entropy and action redundancy. Our results suggest that action redundancy is a fundamental problem in reinforcement learning.
arXiv Detail & Related papers (2021-02-22T19:47:26Z)
Shannon Entropy Rate of Hidden Markov Processes [77.34726150561087]
We show how to calculate entropy rates for hidden Markov chains. We also show how this method gives the minimal set of infinite predictive features. A sequel addresses the challenge's second part on structure.
arXiv Detail & Related papers (2020-08-29T00:48:17Z)
Entropy production in the quantum walk [62.997667081978825]
We focus on the study of the discrete-time quantum walk on the line, from the entropy production perspective. We argue that the evolution of the coin can be modeled as an open two-level system that exchanges energy with the lattice at some effective temperature.
arXiv Detail & Related papers (2020-04-09T23:18:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.