Related papers: Rethinking Word-Level Auto-Completion in Computer-Aided Translation

Rethinking Word-Level Auto-Completion in Computer-Aided Translation

URL: http://arxiv.org/abs/2310.14523v2
Date: Tue, 24 Oct 2023 06:48:07 GMT
Title: Rethinking Word-Level Auto-Completion in Computer-Aided Translation
Authors: Xingyu Chen and Lemao Liu and Guoping Huang and Zhirui Zhang and Mingming Yang and Shuming Shi and Rui Wang
Abstract summary: Word-Level Auto-Completion (WLAC) plays a crucial role in Computer-Assisted Translation. It aims at providing word-level auto-completion suggestions for human translators. We introduce a measurable criterion to answer this question and discover that existing WLAC models often fail to meet this criterion. We propose an effective approach to enhance WLAC performance by promoting adherence to the criterion.
Score: 76.34184928621477
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Word-Level Auto-Completion (WLAC) plays a crucial role in Computer-Assisted Translation. It aims at providing word-level auto-completion suggestions for human translators. While previous studies have primarily focused on designing complex model architectures, this paper takes a different perspective by rethinking the fundamental question: what kind of words are good auto-completions? We introduce a measurable criterion to answer this question and discover that existing WLAC models often fail to meet this criterion. Building upon this observation, we propose an effective approach to enhance WLAC performance by promoting adherence to the criterion. Notably, the proposed approach is general and can be applied to various encoder-based architectures. Through extensive experiments, we demonstrate that our approach outperforms the top-performing system submitted to the WLAC shared tasks in WMT2022, while utilizing significantly smaller model sizes.

Related papers

Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition [86.21199607040147]
Self-Improving cognition (SIcog) is a self-learning framework for constructing next-generation foundation language models. We introduce Chain-of-Description, a step-by-step visual understanding method, and integrate structured chain-of-thought (CoT) reasoning to support in-depth multimodal reasoning. Extensive experiments demonstrate that SIcog produces next-generation foundation MLLMs with substantially improved multimodal cognition.
arXiv Detail & Related papers (2025-03-16T00:25:13Z)
Adversarial Multi-Agent Evaluation of Large Language Models through Iterative Debates [0.0]
We propose a framework that interprets large language models (LLMs) as advocates within an ensemble of interacting agents. This approach offers a more dynamic and comprehensive evaluation process compared to traditional human-based assessments or automated metrics.
arXiv Detail & Related papers (2024-10-07T00:22:07Z)
The OCON model: an old but gold solution for distributable supervised classification [0.28675177318965045]
This paper introduces a structured application of the One-Class approach and the One-Class-One-Network model for supervised classification tasks. We achieve classification accuracy comparable to nowadays complex architectures (90.0 - 93.7%)
arXiv Detail & Related papers (2024-10-05T09:15:01Z)
An Energy-based Model for Word-level AutoCompletion in Computer-aided Translation [97.3797716862478]
Word-level AutoCompletion (WLAC) is a rewarding yet challenging task in Computer-aided Translation. Existing work addresses this task through a classification model based on a neural network that maps the hidden vector of the input context into its corresponding label. This work proposes an energy-based model for WLAC, which enables the context hidden vector to capture crucial information from the source sentence.
arXiv Detail & Related papers (2024-07-29T15:07:19Z)
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement [102.22911097049953]
SIMA is a framework that enhances visual and language modality alignment through self-improvement. It employs an in-context self-critic mechanism to select response pairs for preference tuning. We demonstrate that SIMA achieves superior modality alignment, outperforming previous approaches.
arXiv Detail & Related papers (2024-05-24T23:09:27Z)
The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment framework to bridge the gap between large language models' English and non-English performance. Experiment results show it can boost multilingual performance across diverse reasoning scenarios, model families, and sizes. We analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs.
arXiv Detail & Related papers (2024-05-02T14:49:50Z)
A Large-Scale Evaluation of Speech Foundation Models [110.95827399522204]
We establish the Speech processing Universal PERformance Benchmark (SUPERB) to study the effectiveness of the foundation model paradigm for speech. We propose a unified multi-tasking framework to address speech processing tasks in SUPERB using a frozen foundation model followed by task-specialized, lightweight prediction heads.
arXiv Detail & Related papers (2024-04-15T00:03:16Z)
SurreyAI 2023 Submission for the Quality Estimation Shared Task [17.122657128702276]
This paper describes the approach adopted by the SurreyAI team for addressing the Sentence-Level Direct Assessment task in WMT23. The proposed approach builds upon the TransQuest framework, exploring various autoencoder pre-trained language models. The evaluation utilizes Spearman and Pearson correlation coefficients, assessing the relationship between machine-predicted quality scores and human judgments.
arXiv Detail & Related papers (2023-12-01T12:01:04Z)
Coherent Entity Disambiguation via Modeling Topic and Categorical Dependency [87.16283281290053]
Previous entity disambiguation (ED) methods adopt a discriminative paradigm, where prediction is made based on matching scores between mention context and candidate entities. We propose CoherentED, an ED system equipped with novel designs aimed at enhancing the coherence of entity predictions. We achieve new state-of-the-art results on popular ED benchmarks, with an average improvement of 1.3 F1 points.
arXiv Detail & Related papers (2023-11-06T16:40:13Z)
Automated Speech Scoring System Under The Lens: Evaluating and interpreting the linguistic cues for language proficiency [26.70127591966917]
We utilize classical machine learning models to formulate a speech scoring task as both a classification and a regression problem. First, we extract linguist features under five categories (fluency, pronunciation, content, grammar and vocabulary, and acoustic) and train models to grade responses. In comparison, we find that the regression-based models perform equivalent to or better than the classification approach.
arXiv Detail & Related papers (2021-11-30T06:28:58Z)
Enhancing Dialogue Generation via Multi-Level Contrastive Learning [57.005432249952406]
We propose a multi-level contrastive learning paradigm to model the fine-grained quality of the responses with respect to the query. A Rank-aware (RC) network is designed to construct the multi-level contrastive optimization objectives. We build a Knowledge Inference (KI) component to capture the keyword knowledge from the reference during training and exploit such information to encourage the generation of informative words.
arXiv Detail & Related papers (2020-09-19T02:41:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.