Predictability maximization and the origins of word order harmony
- URL: http://arxiv.org/abs/2408.16570v5
- Date: Wed, 9 Oct 2024 16:52:19 GMT
- Title: Predictability maximization and the origins of word order harmony
- Authors: Ramon Ferrer-i-Cancho,
- Abstract summary: We consider the optimal placement of a head that maximizes the predictability of the sequence.
We show that postponing the head is the optimal strategy to maximize its predictability while bringing it forward is the optimal strategy to maximize the predictability of dependents.
Our findings shed light on the placements of the head adopted by real languages or emerging in different kinds of experiments.
- Score: 0.16317061277456998
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the linguistic problem of the sequential arrangement of a head and its dependents from an information theoretic perspective. In particular, we consider the optimal placement of a head that maximizes the predictability of the sequence. We assume that dependents are statistically independent given a head, in line with the open-choice principle and the core assumptions of dependency grammar. We demonstrate the optimality of harmonic order, i.e., placing the head last maximizes the predictability of the head whereas placing the head first maximizes the predictability of dependents. We also show that postponing the head is the optimal strategy to maximize its predictability while bringing it forward is the optimal strategy to maximize the predictability of dependents. We unravel the advantages of the strategy of maximizing the predictability of the head over maximizing the predictability of dependents. Our findings shed light on the placements of the head adopted by real languages or emerging in different kinds of experiments.
Related papers
- Expected Maximin Fairness in Max-Cut and other Combinatorial Optimization Problems [0.0]
We show that optimal maximin-fair solutions are bounded by non-maximin-fair optimal solutions.
In the paper, we use the special case of Max-Cut to demonstrate challenges in defining and implementing maximin fairness.
arXiv Detail & Related papers (2024-10-03T15:32:04Z) - The optimal placement of the head in the noun phrase. The case of demonstrative, numeral, adjective and noun [0.16317061277456998]
We show that, across preferred orders in languages, the noun tends to be placed at one of the ends.
We also show evidence of anti locality effects: syntactic dependency in preferred orders are longer than expected by chance.
arXiv Detail & Related papers (2024-02-15T20:24:39Z) - Beyond Expectations: Learning with Stochastic Dominance Made Practical [88.06211893690964]
dominance models risk-averse preferences for decision making with uncertain outcomes.
Despite theoretically appealing, the application of dominance in machine learning has been scarce.
We first generalize the dominance concept to enable feasible comparisons between any arbitrary pair of random variables.
We then develop a simple and efficient approach for finding the optimal solution in terms of dominance.
arXiv Detail & Related papers (2024-02-05T03:21:23Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - IRL with Partial Observations using the Principle of Uncertain Maximum
Entropy [8.296684637620553]
We introduce the principle of uncertain maximum entropy and present an expectation-maximization based solution.
We experimentally demonstrate the improved robustness to noisy data offered by our technique in a maximum causal entropy inverse reinforcement learning domain.
arXiv Detail & Related papers (2022-08-15T03:22:46Z) - What Should I Know? Using Meta-gradient Descent for Predictive Feature
Discovery in a Single Stream of Experience [63.75363908696257]
computational reinforcement learning seeks to construct an agent's perception of the world through predictions of future sensations.
An open challenge in this line of work is determining from the infinitely many predictions that the agent could possibly make which predictions might best support decision-making.
We introduce a meta-gradient descent process by which an agent learns what predictions to make, 2) the estimates for its chosen predictions, and 3) how to use those estimates to generate policies that maximize future reward.
arXiv Detail & Related papers (2022-06-13T21:31:06Z) - Correlation Robust Influence Maximization [5.508091917582913]
We propose a distributionally robust model for the influence problem.
We seek a seed set whose expected influence under the worst correlation is maximized.
We show that this worst-case influence can be efficiently computed.
arXiv Detail & Related papers (2020-10-24T04:43:56Z) - A maximum-entropy approach to off-policy evaluation in average-reward
MDPs [54.967872716145656]
This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs)
We provide the first finite-sample OPE error bound, extending existing results beyond the episodic and discounted cases.
We show that this results in an exponential-family distribution whose sufficient statistics are the features, paralleling maximum-entropy approaches in supervised learning.
arXiv Detail & Related papers (2020-06-17T18:13:37Z) - Invariant Rationalization [84.1861516092232]
A typical rationalization criterion, i.e. maximum mutual information (MMI), finds the rationale that maximizes the prediction performance based only on the rationale.
We introduce a game-theoretic invariant rationalization criterion where the rationales are constrained to enable the same predictor to be optimal across different environments.
We show both theoretically and empirically that the proposed rationales can rule out spurious correlations, generalize better to different test scenarios, and align better with human judgments.
arXiv Detail & Related papers (2020-03-22T00:50:27Z) - Toward Optimal Adversarial Policies in the Multiplicative Learning
System with a Malicious Expert [87.12201611818698]
We consider a learning system that combines experts' advice to predict a sequence of true outcomes.
It is assumed that one of the experts is malicious and aims to impose the maximum loss on the system.
We show that a simple greedy policy of always reporting false prediction is optimal with an approximation ratio of $1+O(sqrtfracln NN)$.
For the online setting where the malicious expert can adaptively make its decisions, we show that the optimal online policy can be efficiently computed by solving a dynamic program in $O(N3)$.
arXiv Detail & Related papers (2020-01-02T18:04:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.