Optimal word order for non-causal text generation with Large Language Models: the Spanish case
- URL: http://arxiv.org/abs/2502.14451v1
- Date: Thu, 20 Feb 2025 11:06:16 GMT
- Title: Optimal word order for non-causal text generation with Large Language Models: the Spanish case
- Authors: Andrea Busto-Castiñeira, Silvia García-Méndez, Francisco de Arriba-Pérez, Francisco J. González-Castaño,
- Abstract summary: We present a novel Viterbi algorithm-based methodology for maximum likelihood word order estimation.
We analyze the non-causal most-likelihood order probability for NLG in Spanish and, then, the probability of generating the same phrases with Spanish causal NLG.
- Score: 6.817247544942709
- License:
- Abstract: Natural Language Generation (NLG) popularity has increased owing to the progress in Large Language Models (LLMs), with zero-shot inference capabilities. However, most neural systems utilize decoder-only causal (unidirectional) transformer models, which are effective for English but may reduce the richness of languages with less strict word order, subject omission, or different relative clause attachment preferences. This is the first work that analytically addresses optimal text generation order for non-causal language models. We present a novel Viterbi algorithm-based methodology for maximum likelihood word order estimation. We analyze the non-causal most-likelihood order probability for NLG in Spanish and, then, the probability of generating the same phrases with Spanish causal NLG. This comparative analysis reveals that causal NLG prefers English-like SVO structures. We also analyze the relationship between optimal generation order and causal left-to-right generation order using Spearman's rank correlation. Our results demonstrate that the ideal order predicted by the maximum likelihood estimator is not closely related to the causal order and may be influenced by the syntactic structure of the target sentence.
Related papers
- Predictability and Causality in Spanish and English Natural Language Generation [6.817247544942709]
This paper compares causal and non-causal language modeling for English and Spanish.
According to this experiment, Spanish is more predictable than English given a non-causal context.
These insights support further research in NLG in Spanish using bidirectional transformer language models.
arXiv Detail & Related papers (2024-08-26T14:09:28Z) - Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization [105.3612692153615]
We propose a new axis based on eliciting preferences jointly over instruction-response pairs.
Joint preferences over instruction and response pairs can significantly enhance the alignment of large language models.
arXiv Detail & Related papers (2024-03-31T02:05:40Z) - Optimizing Language Models for Human Preferences is a Causal Inference Problem [41.59906798328058]
We present an initial exploration of language model optimization for human preferences from direct outcome datasets.
We first propose that language model optimization should be viewed as a causal problem to ensure that the model correctly learns the relationship between the text and the outcome.
We extend CPO with doubly robust CPO, which reduces the variance of the surrogate objective while retaining provably strong guarantees on bias.
arXiv Detail & Related papers (2024-02-22T21:36:07Z) - RankGen: Improving Text Generation with Large Ranking Models [43.448251157355784]
RankGen is an encoder model that scores generations given a prefix.
It can be flexibly incorporated as a scoring function in beam search.
We show that RankGen significantly outperforms decoding algorithms like nucleus, top-k, and typical sampling.
arXiv Detail & Related papers (2022-05-19T17:36:46Z) - Typical Decoding for Natural Language Generation [76.69397802617064]
We study why high-probability texts can be dull or repetitive.
We show that typical sampling offers competitive performance in terms of quality.
arXiv Detail & Related papers (2022-02-01T18:58:45Z) - Learning and Analyzing Generation Order for Undirected Sequence Models [86.10875837475783]
We train a policy that learns the generation order for a pre-trained, undirected translation model via reinforcement learning.
We show that the translations by our learned orders achieve higher BLEU scores than the outputs decoded from left to right or decoded by the learned order from Mansimov et al.
Our findings could provide more insights on the mechanism of undirected generation models and encourage further research in this direction.
arXiv Detail & Related papers (2021-12-16T18:29:07Z) - Exploiting Language Model for Efficient Linguistic Steganalysis: An
Empirical Study [23.311007481830647]
We present two methods to efficient linguistic steganalysis.
One is to pre-train a language model based on RNN, and the other is to pre-train a sequence autoencoder.
arXiv Detail & Related papers (2021-07-26T12:37:18Z) - Second-Order Unsupervised Neural Dependency Parsing [52.331561380948564]
Most unsupervised dependencys are based on first-order probabilistic generative models that only consider local parent-child information.
Inspired by second-order supervised dependency parsing, we proposed a second-order extension of unsupervised neural dependency models that incorporate grandparent-child or sibling information.
Our joint model achieves a 10% improvement over the previous state-of-the-art on the full WSJ test set.
arXiv Detail & Related papers (2020-10-28T03:01:33Z) - Recurrent Neural Network Language Models Always Learn English-Like
Relative Clause Attachment [17.995905582226463]
We compare model performance in English and Spanish to show that non-linguistic biases in RNN LMs advantageously overlap with syntactic structure in English but not Spanish.
English models may appear to acquire human-like syntactic preferences, while models trained on Spanish fail to acquire comparable human-like preferences.
arXiv Detail & Related papers (2020-05-01T01:21:47Z) - On the Discrepancy between Density Estimation and Sequence Generation [92.70116082182076]
log-likelihood is highly correlated with BLEU when we consider models within the same family.
We observe no correlation between rankings of models across different families.
arXiv Detail & Related papers (2020-02-17T20:13:35Z) - Parameter Space Factorization for Zero-Shot Learning across Tasks and
Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters.
We infer the posteriors over such latent variables based on data from seen task-language combinations.
Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.