A Simplistic Model of Neural Scaling Laws: Multiperiodic Santa Fe
Processes
- URL: http://arxiv.org/abs/2302.09049v1
- Date: Fri, 17 Feb 2023 18:27:27 GMT
- Title: A Simplistic Model of Neural Scaling Laws: Multiperiodic Santa Fe
Processes
- Authors: {\L}ukasz D\k{e}bowski
- Abstract summary: It was observed that large language models exhibit a power-law decay of cross entropy with respect to the number of parameters and training tokens.
When extrapolated literally, this decay implies that the entropy rate of natural language is zero.
We construct a simple stationary process and its memory-based predictor that exhibit a power-law decay of cross entropy with the vanishing entropy rate.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It was observed that large language models exhibit a power-law decay of cross
entropy with respect to the number of parameters and training tokens. When
extrapolated literally, this decay implies that the entropy rate of natural
language is zero. To understand this phenomenon -- or an artifact -- better, we
construct a simple stationary stochastic process and its memory-based predictor
that exhibit a power-law decay of cross entropy with the vanishing entropy
rate. Our example is based on previously discussed Santa Fe processes, which
decompose a random text into a process of narration and time-independent
knowledge. Previous discussions assumed that narration is a memoryless source
with Zipf's distribution. In this paper, we propose a model of narration that
has the vanishing entropy rate and applies a randomly chosen deterministic
sequence called a multiperiodic sequence. Under a suitable parameterization,
multiperiodic sequences exhibit asymptotic relative frequencies given by Zipf's
law. Remaining agnostic about the value of the entropy rate of natural
language, we discuss relevance of similar constructions for language modeling.
Related papers
- Flow-Based Non-stationary Temporal Regime Causal Structure Learning [49.77103348208835]
We introduce FANTOM, a unified framework for causal discovery.<n>It handles non stationary processes along with non Gaussian and heteroscedastic noises.<n>It simultaneously infers the number of regimes and their corresponding indices and learns each regime's Directed Acyclic Graph.
arXiv Detail & Related papers (2025-06-20T15:12:43Z) - Fully quantum stochastic entropy production [2.3895981099137535]
Building on the approach of thermodynamics, we define entropy production for arbitrary quantum processes.
We show that the classical expression for average entropy production involves only comparisons of statistics at the input or output.
We construct an entropy production operator, that generalizes the value of entropy to the non-commutative case.
arXiv Detail & Related papers (2024-12-17T02:45:10Z) - First numerical observation of the Berezinskii-Kosterlitz-Thouless transition in language models [1.4061979259370274]
We numerically demonstrate an unambiguous phase transition in the framework of a natural language model.
We identify the phase transition as a variant of the Berezinskii-Kosterlitz-Thouless transition.
arXiv Detail & Related papers (2024-12-02T07:32:32Z) - Simulating Time-dependent Hamiltonian Based On High Order Runge-Kutta and Forward Euler Method [0.0]
We propose a new method for simulating certain type of time-dependent Hamiltonian $H(t) = sum_i=1m gamma_i(t) H_i$ where $gamma_i(t)$ is bounded, computable function of time $t$, and each $H_i$ is time-independent.
Our quantum algorithms are based on high-order Runge-Kutta method and forward Euler method, where the time interval is divided into subintervals.
arXiv Detail & Related papers (2024-10-18T12:31:57Z) - Symmetry operations and Critical Behaviour in Classical to Quantum Stochastic Processes [0.0]
We show that the relaxation processes unfold very differently for the different quantum extensions.<n>We find a rather ambiguous relation between the coherence measure based on the L1-norm and the speed of the relaxation process.
arXiv Detail & Related papers (2024-09-14T03:01:54Z) - Krylov Subspace Methods for Quantum Dynamics with Time-Dependent Generators [0.0]
We introduce a generalization valid for driven quantum systems governed by a time-dependent Hamiltonian.<n>This representation is used to establish a novel class of fundamental limits to the quantum speed of evolution and operator growth.<n>We also discuss generalizations of the algorithm, adapted to discretized time evolutions and periodic Hamiltonians, with applications to many-body systems.
arXiv Detail & Related papers (2024-08-15T19:00:24Z) - Causal Layering via Conditional Entropy [85.01590667411956]
Causal discovery aims to recover information about an unobserved causal graph from the observable data it generates.
We provide ways to recover layerings of a graph by accessing the data via a conditional entropy oracle.
arXiv Detail & Related papers (2024-01-19T05:18:28Z) - On the strong stability of ergodic iterations [0.0]
We revisit processes generated by iterated random functions driven by a stationary and ergodic sequence.
New results are deduced for Langevin-type iterations with dependent noise and for multitype branching processes.
arXiv Detail & Related papers (2023-04-10T15:33:56Z) - Observational entropic study of Anderson localization [0.0]
We study the behaviour of the observational entropy in the context of localization-delocalization transition for one-dimensional Aubrey-Andr'e model.
For a given coarse-graining, it increases logarithmically with system size in the delocalized phase, and obeys area law in the localized phase.
We also find the increase of the observational entropy followed by the quantum quench, is logarithmic in time in the delocalized phase as well as at the transition point, while in the localized phase it oscillates.
arXiv Detail & Related papers (2022-09-21T11:26:43Z) - On the Convergence of the ELBO to Entropy Sums [3.345575993695074]
We show that the variational lower bound is at all stationary points of learning equal to a sum of entropies.
For a very large class of generative models, the variational lower bound is at all stationary points of learning.
arXiv Detail & Related papers (2022-09-07T11:33:32Z) - R\'{e}nyi entanglement entropy after a quantum quench starting from
insulating states in a free boson system [0.0]
We investigate the time-dependent R'enyi entanglement entropy after a quantum quench.
We calculate the time evolution of the R'enyi entanglement entropy in unprecedentedly large systems.
We discuss possible applications of our findings to the real-time dynamics of noninteracting bosonic systems.
arXiv Detail & Related papers (2022-07-18T02:36:14Z) - A High-Quality Entropy Source Using van der Waals Heterojunction for
True Random Number Generation [0.41998444721319217]
Generators of random sequences used in high-end applications such as cryptography rely on entropy sources for their indeterminism.
We present a compact device capable of detecting discrete charge fluctuations for extracting entropy from physical processes.
We demonstrate an entropy generation rate tunable over multiple orders of magnitude and show the persistence of the underlying physical process for temperatures ranging from cryogenic to ambient conditions.
arXiv Detail & Related papers (2022-04-13T17:25:08Z) - Algebraic Compression of Quantum Circuits for Hamiltonian Evolution [52.77024349608834]
Unitary evolution under a time dependent Hamiltonian is a key component of simulation on quantum hardware.
We present an algorithm that compresses the Trotter steps into a single block of quantum gates.
This results in a fixed depth time evolution for certain classes of Hamiltonians.
arXiv Detail & Related papers (2021-08-06T19:38:01Z) - Entropy Production and the Role of Correlations in Quantum Brownian
Motion [77.34726150561087]
We perform a study on quantum entropy production, different kinds of correlations, and their interplay in the driven Caldeira-Leggett model of quantum Brownian motion.
arXiv Detail & Related papers (2021-08-05T13:11:05Z) - Observation of Time-Crystalline Eigenstate Order on a Quantum Processor [80.17270167652622]
Quantum-body systems display rich phase structure in their low-temperature equilibrium states.
We experimentally observe an eigenstate-ordered DTC on superconducting qubits.
Results establish a scalable approach to study non-equilibrium phases of matter on current quantum processors.
arXiv Detail & Related papers (2021-07-28T18:00:03Z) - Aspects of Pseudo Entropy in Field Theories [0.0]
We numerically analyze a class of free scalar field theories and the XY spin model.
This reveals the basic properties of pseudo entropy in many-body systems.
We find that the non-positivity of the difference can be violated only if the initial and final states belong to different quantum phases.
arXiv Detail & Related papers (2021-06-06T13:25:35Z) - Action Redundancy in Reinforcement Learning [54.291331971813364]
We show that transition entropy can be described by two terms; namely, model-dependent transition entropy and action redundancy.
Our results suggest that action redundancy is a fundamental problem in reinforcement learning.
arXiv Detail & Related papers (2021-02-22T19:47:26Z) - Leveraging Global Parameters for Flow-based Neural Posterior Estimation [90.21090932619695]
Inferring the parameters of a model based on experimental observations is central to the scientific method.
A particularly challenging setting is when the model is strongly indeterminate, i.e., when distinct sets of parameters yield identical observations.
We present a method for cracking such indeterminacy by exploiting additional information conveyed by an auxiliary set of observations sharing global parameters.
arXiv Detail & Related papers (2021-02-12T12:23:13Z) - Synergetic Learning of Heterogeneous Temporal Sequences for
Multi-Horizon Probabilistic Forecasting [48.8617204809538]
We propose Variational Synergetic Multi-Horizon Network (VSMHN), a novel deep conditional generative model.
To learn complex correlations across heterogeneous sequences, a tailored encoder is devised to combine the advances in deep point processes models and variational recurrent neural networks.
Our model can be trained effectively using variational inference and generates predictions with Monte-Carlo simulation.
arXiv Detail & Related papers (2021-01-31T11:00:55Z) - The Connection between Discrete- and Continuous-Time Descriptions of
Gaussian Continuous Processes [60.35125735474386]
We show that discretizations yielding consistent estimators have the property of invariance under coarse-graining'
This result explains why combining differencing schemes for derivatives reconstruction and local-in-time inference approaches does not work for time series analysis of second or higher order differential equations.
arXiv Detail & Related papers (2021-01-16T17:11:02Z) - Shannon Entropy Rate of Hidden Markov Processes [77.34726150561087]
We show how to calculate entropy rates for hidden Markov chains.
We also show how this method gives the minimal set of infinite predictive features.
A sequel addresses the challenge's second part on structure.
arXiv Detail & Related papers (2020-08-29T00:48:17Z) - Relevant OTOC operators: footprints of the classical dynamics [68.8204255655161]
The OTOC-RE theorem relates the OTOCs summed over a complete base of operators to the second Renyi entropy.
We show that the sum over a small set of relevant operators, is enough in order to obtain a very good approximation for the entropy.
In turn, this provides with an alternative natural indicator of complexity, i.e. the scaling of the number of relevant operators with time.
arXiv Detail & Related papers (2020-07-31T19:23:26Z) - Graph Gamma Process Generalized Linear Dynamical Systems [60.467040479276704]
We introduce graph gamma process (GGP) linear dynamical systems to model real multivariate time series.
For temporal pattern discovery, the latent representation under the model is used to decompose the time series into a parsimonious set of multivariate sub-sequences.
We use the generated random graph, whose number of nonzero-degree nodes is finite, to define both the sparsity pattern and dimension of the latent state transition matrix.
arXiv Detail & Related papers (2020-07-25T04:16:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.