Related papers: Sort by Structure: Language Model Ranking as Dependency Probing

Sort by Structure: Language Model Ranking as Dependency Probing

URL: http://arxiv.org/abs/2206.04935v1
Date: Fri, 10 Jun 2022 08:10:29 GMT
Title: Sort by Structure: Language Model Ranking as Dependency Probing
Authors: Max M\"uller-Eberstein, Rob van der Goot and Barbara Plank
Abstract summary: Making an informed choice of pre-trained language model (LM) is critical for performance, yet environmentally costly, and as such widely underexplored. We propose probing to rank LMs, specifically for parsing dependencies in a given language, by measuring the degree to which labeled trees are recoverable from an LM's contextualized embeddings. Across 46 typologically and architecturally diverse LM-language pairs, our approach predicts the best LM choice of 79% of orders of less compute than training a full magnitude of orders of less compute.
Score: 25.723591566201343
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Making an informed choice of pre-trained language model (LM) is critical for performance, yet environmentally costly, and as such widely underexplored. The field of Computer Vision has begun to tackle encoder ranking, with promising forays into Natural Language Processing, however they lack coverage of linguistic tasks such as structured prediction. We propose probing to rank LMs, specifically for parsing dependencies in a given language, by measuring the degree to which labeled trees are recoverable from an LM's contextualized embeddings. Across 46 typologically and architecturally diverse LM-language pairs, our probing approach predicts the best LM choice 79% of the time using orders of magnitude less compute than training a full parser. Within this study, we identify and analyze one recently proposed decoupled LM - RemBERT - and find it strikingly contains less inherent dependency information, but often yields the best parser after full fine-tuning. Without this outlier our approach identifies the best LM in 89% of cases.

Related papers

Language Models and Cycle Consistency for Self-Reflective Machine Translation [1.79487674052027]
We generate multiple translation candidates from a source language A to a target language B, and subsequently translate these candidates back to the original language A. By evaluating the cycle consistency between the original and back-translated sentences using metrics such as token-level precision and accuracy, we implicitly estimate the translation quality in language B. For each source sentence, we identify the translation candidate with optimal cycle consistency with the original sentence as the final answer.
arXiv Detail & Related papers (2024-11-05T04:01:41Z)
What do Large Language Models Need for Machine Translation Evaluation? [12.42394213466485]
Large language models (LLMs) can achieve results comparable to fine-tuned multilingual pre-trained language models. This paper explores what translation information, such as the source, reference, translation errors and annotation guidelines, is needed for LLMs to evaluate machine translation quality.
arXiv Detail & Related papers (2024-10-04T09:50:45Z)
Understanding and Mitigating Language Confusion in LLMs [76.96033035093204]
We evaluate 15 typologically diverse languages with existing and newly-created English and multilingual prompts. We find that Llama Instruct and Mistral models exhibit high degrees of language confusion. We find that language confusion can be partially mitigated via few-shot prompting, multilingual SFT and preference tuning.
arXiv Detail & Related papers (2024-06-28T17:03:51Z)
BEAR: A Unified Framework for Evaluating Relational Knowledge in Causal and Masked Language Models [2.2863439039616127]
Probing assesses to which degree a language model (LM) has successfully learned relational knowledge during pre-training. Previous approaches rely on the objective function used in pre-training LMs. We propose an approach that uses an LM's inherent ability to estimate the log-likelihood of any given textual statement.
arXiv Detail & Related papers (2024-04-05T14:13:55Z)
Paloma: A Benchmark for Evaluating Language Model Fit [112.481957296585]
Evaluations of language models (LMs) commonly report perplexity on monolithic data held out from training. We introduce Perplexity Analysis for Language Model Assessment (Paloma), a benchmark to measure LM fit to 546 English and code domains.
arXiv Detail & Related papers (2023-12-16T19:12:45Z)
The Consensus Game: Language Model Generation via Equilibrium Search [73.51411916625032]
We introduce a new, a training-free, game-theoretic procedure for language model decoding. Our approach casts language model decoding as a regularized imperfect-information sequential signaling game. Applying EQUILIBRIUM-RANKING to LLaMA-7B outperforms the much larger LLaMA-65B and PaLM-540B models.
arXiv Detail & Related papers (2023-10-13T14:27:21Z)
LeTI: Learning to Generate from Textual Interactions [60.425769582343506]
We explore LMs' potential to learn from textual interactions (LETI) that not only check their correctness with binary labels but also pinpoint and explain errors in their outputs through textual feedback. Our focus is the code generation task, where the model produces code based on natural language instructions. LETI iteratively fine-tunes the model, using the objective LM, on a concatenation of natural language instructions, LM-generated programs, and textual feedback.
arXiv Detail & Related papers (2023-05-17T15:53:31Z)
Multilingual Syntax-aware Language Modeling through Dependency Tree Conversion [12.758523394180695]
We study the effect on neural language models (LMs) performance across nine conversion methods and five languages. On average, the performance of our best model represents a 19 % increase in accuracy over the worst choice across all languages. Our experiments highlight the importance of choosing the right tree formalism, and provide insights into making an informed decision.
arXiv Detail & Related papers (2022-04-19T03:56:28Z)
Reusing a Pretrained Language Model on Languages with Limited Corpora for Unsupervised NMT [129.99918589405675]
We present an effective approach that reuses an LM that is pretrained only on the high-resource language. The monolingual LM is fine-tuned on both languages and is then used to initialize a UNMT model. Our approach, RE-LM, outperforms a competitive cross-lingual pretraining model (XLM) in English-Macedonian (En-Mk) and English-Albanian (En-Sq)
arXiv Detail & Related papers (2020-09-16T11:37:10Z)
El Departamento de Nosotros: How Machine Translated Corpora Affects Language Models in MRC Tasks [0.12183405753834563]
Pre-training large-scale language models (LMs) requires huge amounts of text corpora. We study the caveats of applying directly translated corpora for fine-tuning LMs for downstream natural language processing tasks. We show that careful curation along with post-processing lead to improved performance and overall LMs robustness.
arXiv Detail & Related papers (2020-07-03T22:22:44Z)
Language Model Prior for Low-Resource Neural Machine Translation [85.55729693003829]
We propose a novel approach to incorporate a LM as prior in a neural translation model (TM) We add a regularization term, which pushes the output distributions of the TM to be probable under the LM prior. Results on two low-resource machine translation datasets show clear improvements even with limited monolingual data.
arXiv Detail & Related papers (2020-04-30T16:29:56Z)
Byte Pair Encoding is Suboptimal for Language Model Pretraining [49.30780227162387]
We analyze differences between unigram LM tokenization and byte-pair encoding (BPE) We find that the unigram LM tokenization method matches or outperforms BPE across downstream tasks and two languages. We hope that developers of future pretrained LMs will consider adopting the unigram LM method over the more prevalent BPE.
arXiv Detail & Related papers (2020-04-07T21:21:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.