Rethinking Word-Level Auto-Completion in Computer-Aided Translation
        - URL: http://arxiv.org/abs/2310.14523v2
- Date: Tue, 24 Oct 2023 06:48:07 GMT
- Title: Rethinking Word-Level Auto-Completion in Computer-Aided Translation
- Authors: Xingyu Chen and Lemao Liu and Guoping Huang and Zhirui Zhang and
  Mingming Yang and Shuming Shi and Rui Wang
- Abstract summary: Word-Level Auto-Completion (WLAC) plays a crucial role in Computer-Assisted Translation.
It aims at providing word-level auto-completion suggestions for human translators.
We introduce a measurable criterion to answer this question and discover that existing WLAC models often fail to meet this criterion.
We propose an effective approach to enhance WLAC performance by promoting adherence to the criterion.
- Score: 76.34184928621477
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Word-Level Auto-Completion (WLAC) plays a crucial role in Computer-Assisted
Translation. It aims at providing word-level auto-completion suggestions for
human translators. While previous studies have primarily focused on designing
complex model architectures, this paper takes a different perspective by
rethinking the fundamental question: what kind of words are good
auto-completions? We introduce a measurable criterion to answer this question
and discover that existing WLAC models often fail to meet this criterion.
Building upon this observation, we propose an effective approach to enhance
WLAC performance by promoting adherence to the criterion. Notably, the proposed
approach is general and can be applied to various encoder-based architectures.
Through extensive experiments, we demonstrate that our approach outperforms the
top-performing system submitted to the WLAC shared tasks in WMT2022, while
utilizing significantly smaller model sizes.
 
      
        Related papers
        - Will Pre-Training Ever End? A First Step Toward Next-Generation   Foundation MLLMs via Self-Improving Systematic Cognition [86.21199607040147]
 Self-Improving cognition (SIcog) is a self-learning framework for constructing next-generation foundation language models.
We introduce Chain-of-Description, a step-by-step visual understanding method, and integrate structured chain-of-thought (CoT) reasoning to support in-depth multimodal reasoning.
Extensive experiments demonstrate that SIcog produces next-generation foundation MLLMs with substantially improved multimodal cognition.
 arXiv  Detail & Related papers  (2025-03-16T00:25:13Z)
- Adversarial Multi-Agent Evaluation of Large Language Models through   Iterative Debates [0.0]
 We propose a framework that interprets large language models (LLMs) as advocates within an ensemble of interacting agents.
This approach offers a more dynamic and comprehensive evaluation process compared to traditional human-based assessments or automated metrics.
 arXiv  Detail & Related papers  (2024-10-07T00:22:07Z)
- The OCON model: an old but gold solution for distributable supervised   classification [0.28675177318965045]
 This paper introduces a structured application of the One-Class approach and the One-Class-One-Network model for supervised classification tasks.
We achieve classification accuracy comparable to nowadays complex architectures (90.0 - 93.7%)
 arXiv  Detail & Related papers  (2024-10-05T09:15:01Z)
- An Energy-based Model for Word-level AutoCompletion in Computer-aided   Translation [97.3797716862478]
 Word-level AutoCompletion (WLAC) is a rewarding yet challenging task in Computer-aided Translation.
Existing work addresses this task through a classification model based on a neural network that maps the hidden vector of the input context into its corresponding label.
This work proposes an energy-based model for WLAC, which enables the context hidden vector to capture crucial information from the source sentence.
 arXiv  Detail & Related papers  (2024-07-29T15:07:19Z)
- Enhancing Visual-Language Modality Alignment in Large Vision Language   Models via Self-Improvement [102.22911097049953]
 SIMA is a framework that enhances visual and language modality alignment through self-improvement.
It employs an in-context self-critic mechanism to select response pairs for preference tuning.
We demonstrate that SIMA achieves superior modality alignment, outperforming previous approaches.
 arXiv  Detail & Related papers  (2024-05-24T23:09:27Z)
- The Power of Question Translation Training in Multilingual Reasoning:   Broadened Scope and Deepened Insights [108.40766216456413]
 We propose a question alignment framework to bridge the gap between large language models' English and non-English performance.
Experiment results show it can boost multilingual performance across diverse reasoning scenarios, model families, and sizes.
We analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs.
 arXiv  Detail & Related papers  (2024-05-02T14:49:50Z)
- A Large-Scale Evaluation of Speech Foundation Models [110.95827399522204]
 We establish the Speech processing Universal PERformance Benchmark (SUPERB) to study the effectiveness of the foundation model paradigm for speech.
We propose a unified multi-tasking framework to address speech processing tasks in SUPERB using a frozen foundation model followed by task-specialized, lightweight prediction heads.
 arXiv  Detail & Related papers  (2024-04-15T00:03:16Z)
- SurreyAI 2023 Submission for the Quality Estimation Shared Task [17.122657128702276]
 This paper describes the approach adopted by the SurreyAI team for addressing the Sentence-Level Direct Assessment task in WMT23.
The proposed approach builds upon the TransQuest framework, exploring various autoencoder pre-trained language models.
The evaluation utilizes Spearman and Pearson correlation coefficients, assessing the relationship between machine-predicted quality scores and human judgments.
 arXiv  Detail & Related papers  (2023-12-01T12:01:04Z)
- Coherent Entity Disambiguation via Modeling Topic and Categorical
  Dependency [87.16283281290053]
 Previous entity disambiguation (ED) methods adopt a discriminative paradigm, where prediction is made based on matching scores between mention context and candidate entities.
We propose CoherentED, an ED system equipped with novel designs aimed at enhancing the coherence of entity predictions.
We achieve new state-of-the-art results on popular ED benchmarks, with an average improvement of 1.3 F1 points.
 arXiv  Detail & Related papers  (2023-11-06T16:40:13Z)
- Automated Speech Scoring System Under The Lens: Evaluating and
  interpreting the linguistic cues for language proficiency [26.70127591966917]
 We utilize classical machine learning models to formulate a speech scoring task as both a classification and a regression problem.
First, we extract linguist features under five categories (fluency, pronunciation, content, grammar and vocabulary, and acoustic) and train models to grade responses.
In comparison, we find that the regression-based models perform equivalent to or better than the classification approach.
 arXiv  Detail & Related papers  (2021-11-30T06:28:58Z)
- Enhancing Dialogue Generation via Multi-Level Contrastive Learning [57.005432249952406]
 We propose a multi-level contrastive learning paradigm to model the fine-grained quality of the responses with respect to the query.
A Rank-aware (RC) network is designed to construct the multi-level contrastive optimization objectives.
We build a Knowledge Inference (KI) component to capture the keyword knowledge from the reference during training and exploit such information to encourage the generation of informative words.
 arXiv  Detail & Related papers  (2020-09-19T02:41:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.