Reranking Machine Translation Hypotheses with Structured and Web-based
Language Models
- URL: http://arxiv.org/abs/2104.12277v1
- Date: Sun, 25 Apr 2021 22:09:03 GMT
- Title: Reranking Machine Translation Hypotheses with Structured and Web-based
Language Models
- Authors: Wen Wang and Andreas Stolcke and Jing Zheng
- Abstract summary: Two structured language models are applied for N-best rescoring.
We find that the combination of these language models increases the BLEU score up to 1.6% absolutely on blind test sets.
- Score: 11.363601836199331
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we investigate the use of linguistically motivated and
computationally efficient structured language models for reranking N-best
hypotheses in a statistical machine translation system. These language models,
developed from Constraint Dependency Grammar parses, tightly integrate
knowledge of words, morphological and lexical features, and syntactic
dependency constraints. Two structured language models are applied for N-best
rescoring, one is an almost-parsing language model, and the other utilizes more
syntactic features by explicitly modeling syntactic dependencies between words.
We also investigate effective and efficient language modeling methods to use
N-grams extracted from up to 1 teraword of web documents. We apply all these
language models for N-best re-ranking on the NIST and DARPA GALE program 2006
and 2007 machine translation evaluation tasks and find that the combination of
these language models increases the BLEU score up to 1.6% absolutely on blind
test sets.
Related papers
- Linguistically Grounded Analysis of Language Models using Shapley Head Values [2.914115079173979]
We investigate the processing of morphosyntactic phenomena by leveraging a recently proposed method for probing language models via Shapley Head Values (SHVs)
Using the English language BLiMP dataset, we test our approach on two widely used models, BERT and RoBERTa, and compare how linguistic constructions are handled.
Our results show that SHV-based attributions reveal distinct patterns across both models, providing insights into how language models organize and process linguistic information.
arXiv Detail & Related papers (2024-10-17T09:48:08Z) - Exploring Multiple Strategies to Improve Multilingual Coreference Resolution in CorefUD [0.0]
This paper presents our end-to-end neural coreference resolution system.
We first establish strong baseline models, including monolingual and cross-lingual variations.
We propose several extensions to enhance performance across diverse linguistic contexts.
arXiv Detail & Related papers (2024-08-29T20:27:05Z) - On the Analysis of Cross-Lingual Prompt Tuning for Decoder-based
Multilingual Model [49.81429697921861]
We study the interaction between parameter-efficient fine-tuning (PEFT) and cross-lingual tasks in multilingual autoregressive models.
We show that prompt tuning is more effective in enhancing the performance of low-resource languages than fine-tuning.
arXiv Detail & Related papers (2023-11-14T00:43:33Z) - JCoLA: Japanese Corpus of Linguistic Acceptability [3.6141428739228902]
We introduce JCoLA (Japanese Corpus of Linguistic Acceptability), which consists of 10,020 sentences annotated with binary acceptability judgments.
We then evaluate the syntactic knowledge of 9 different types of Japanese language models on JCoLA.
arXiv Detail & Related papers (2023-09-22T07:35:45Z) - Entity-Assisted Language Models for Identifying Check-worthy Sentences [23.792877053142636]
We propose a new uniform framework for text classification and ranking.
Our framework combines the semantic analysis of the sentences, with additional entity embeddings obtained through the identified entities within the sentences.
We extensively evaluate the effectiveness of our framework using two publicly available datasets from the CLEF's 2019 & 2020 CheckThat! Labs.
arXiv Detail & Related papers (2022-11-19T12:03:30Z) - Dependency-based Mixture Language Models [53.152011258252315]
We introduce the Dependency-based Mixture Language Models.
In detail, we first train neural language models with a novel dependency modeling objective.
We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention.
arXiv Detail & Related papers (2022-03-19T06:28:30Z) - An Application of Pseudo-Log-Likelihoods to Natural Language Scoring [5.382454613390483]
A language model with relatively few parameters and training steps can outperform it on a recent large data set.
We produce some absolute state-of-the-art results for common sense reasoning in binary choice tasks.
We argue that robustness of the smaller model ought to be understood in terms of compositionality.
arXiv Detail & Related papers (2022-01-23T22:00:54Z) - Specializing Multilingual Language Models: An Empirical Study [50.7526245872855]
Contextualized word representations from pretrained multilingual language models have become the de facto standard for addressing natural language tasks.
For languages rarely or never seen by these models, directly using such models often results in suboptimal representation or use of data.
arXiv Detail & Related papers (2021-06-16T18:13:55Z) - Improving the Lexical Ability of Pretrained Language Models for
Unsupervised Neural Machine Translation [127.81351683335143]
Cross-lingual pretraining requires models to align the lexical- and high-level representations of the two languages.
Previous research has shown that this is because the representations are not sufficiently aligned.
In this paper, we enhance the bilingual masked language model pretraining with lexical-level information by using type-level cross-lingual subword embeddings.
arXiv Detail & Related papers (2021-03-18T21:17:58Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - Improving Massively Multilingual Neural Machine Translation and
Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations.
We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics.
We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.