LATA: A Tool for LLM-Assisted Translation Annotation
- URL: http://arxiv.org/abs/2602.10454v1
- Date: Wed, 11 Feb 2026 02:49:01 GMT
- Title: LATA: A Tool for LLM-Assisted Translation Annotation
- Authors: Baorong Huang, Ali Asiri,
- Abstract summary: This paper introduces a novel, LLM-assisted interactive tool to reduce the gap between automation and rigorous precision required for expert human judgment.<n>Unlike traditional statistical scalable, our system employs a template-based Prompt Manager that leverages large language models (LLMs) for sentence segmentation and alignment.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The construction of high-quality parallel corpora for translation research has increasingly evolved from simple sentence alignment to complex, multi-layered annotation tasks. This methodological shift presents significant challenges for structurally divergent language pairs, such as Arabic--English, where standard automated tools frequently fail to capture deep linguistic shifts or semantic nuances. This paper introduces a novel, LLM-assisted interactive tool designed to reduce the gap between scalable automation and the rigorous precision required for expert human judgment. Unlike traditional statistical aligners, our system employs a template-based Prompt Manager that leverages large language models (LLMs) for sentence segmentation and alignment under strict JSON output constraints. In this tool, automated preprocessing integrates into a human-in-the-loop workflow, allowing researchers to refine alignments and apply custom translation technique annotations through a stand-off architecture. By leveraging LLM-assisted processing, the tool balances annotation efficiency with the linguistic precision required to analyze complex translation phenomena in specialized domains.
Related papers
- Semantically Labelled Automata for Multi-Task Reinforcement Learning with LTL Instructions [61.479946958462754]
We study multi-task reinforcement learning (RL), a setting in which an agent learns a single, universal policy.<n>We present a novel task embedding technique leveraging a new generation of semantic translations-to-automata.
arXiv Detail & Related papers (2026-02-06T14:46:27Z) - Bridging the Knowledge Void: Inference-time Acquisition of Unfamiliar Programming Languages for Coding Tasks [22.908904483320953]
Large Language Models (LLMs) in coding tasks are often a reflection of their extensive pre-training corpora.<n>We propose ILA-agent, a general ILA framework that equips LLMs with a set of behavioral primitives.<n>We instantiate ILA-agent for Cangjie and evaluate its performance across code generation, translation, and program repair tasks.
arXiv Detail & Related papers (2026-01-16T09:06:47Z) - Finding the Translation Switch: Discovering and Exploiting the Task-Initiation Features in LLMs [69.28193153685893]
Large Language Models (LLMs) frequently exhibit strong translation abilities, even without task-specific fine-tuning.<n>To demystify this process, we leverage Sparse Autoencoders (SAEs) and introduce a novel framework for identifying task-specific features.<n>Our work not only decodes a core component of the translation mechanism in LLMs but also provides a blueprint for using internal model mechanism to create more robust and efficient models.
arXiv Detail & Related papers (2026-01-16T06:29:07Z) - Fine-Tuned Language Models for Domain-Specific Summarization and Tagging [3.543800643014229]
This paper presents a pipeline integrating fine-tuned large language models (LLMs) with named entity recognition (NER) for efficient domain-specific text summarization and tagging.<n>The authors address the challenge posed by rapidly evolving sub-cultural languages and slang, which complicate automated information extraction and law enforcement monitoring.
arXiv Detail & Related papers (2025-10-29T12:33:48Z) - ToolACE-MT: Non-Autoregressive Generation for Agentic Multi-Turn Interaction [84.90394416593624]
Agentic task-solving with Large Language Models (LLMs) requires multi-turn, multi-step interactions.<n>Existing simulation-based data generation methods rely heavily on costly autoregressive interactions between multiple agents.<n>We propose a novel Non-Autoregressive Iterative Generation framework, called ToolACE-MT, for constructing high-quality multi-turn agentic dialogues.
arXiv Detail & Related papers (2025-08-18T07:38:23Z) - From over-reliance to smart integration: using Large-Language Models as translators between specialized modeling and simulation tools [0.3749861135832073]
Large Language Models (LLMs) offer transformative potential for Modeling & Simulation (M&S)<n>Over-reliance risks compromising quality due to ambiguities, logical shortcuts, and hallucinations.<n>This paper advocates integrating LLMs as translators between specialized tools to mitigate complexity in M&S tasks.
arXiv Detail & Related papers (2025-06-11T02:39:08Z) - Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation [33.08089616645845]
Large Language Models (LLMs) have reshaped the landscape of machine translation (MT)<n>We analyze techniques such as few-shot prompting, cross-lingual transfer, and parameter-efficient fine-tuning.<n>We discuss persistent challenges - such as hallucinations, evaluation inconsistencies, and inherited biases.
arXiv Detail & Related papers (2025-04-02T17:26:40Z) - Scaffolded Language Models with Language Supervision for Mixed-Autonomy: A Survey [52.00674453604779]
This survey organizes the literature on the design and optimization of emerging structures around post-trained LMs.<n>We refer to this overarching structure as scaffolded LMs and focus on LMs that are integrated into multi-step processes with tools.
arXiv Detail & Related papers (2024-10-21T18:06:25Z) - Machine Translation with Large Language Models: Prompt Engineering for
Persian, English, and Russian Directions [0.0]
Generative large language models (LLMs) have demonstrated exceptional proficiency in various natural language processing (NLP) tasks.
We conducted an investigation into two popular prompting methods and their combination, focusing on cross-language combinations of Persian, English, and Russian.
arXiv Detail & Related papers (2024-01-16T15:16:34Z) - The Devil is in the Errors: Leveraging Large Language Models for
Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations.
We study the impact of labeled data through in-context learning and finetuning.
We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z) - Decomposed Prompting for Machine Translation Between Related Languages
using Large Language Models [55.35106713257871]
We introduce DecoMT, a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations.
We show that DecoMT outperforms the strong few-shot prompting BLOOM model with an average improvement of 8 chrF++ scores across the examined languages.
arXiv Detail & Related papers (2023-05-22T14:52:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.