Related papers: "Don't Teach Minerva": Guiding LLMs Through Complex Syntax for Faithful Latin Translation with RAG

"Don't Teach Minerva": Guiding LLMs Through Complex Syntax for Faithful Latin Translation with RAG

URL: http://arxiv.org/abs/2511.01454v1
Date: Mon, 03 Nov 2025 11:11:27 GMT
Title: "Don't Teach Minerva": Guiding LLMs Through Complex Syntax for Faithful Latin Translation with RAG
Authors: Sergio Torres Aguilar,
Abstract summary: This paper introduces a reproducible draft-based refinement pipeline that elevates open-source Large Language Models to a performance level statistically comparable to top-tier proprietary systems.<n>We demonstrate the robustness of this approach on two distinct benchmarks: a standard in-domain test set (Rosenthal, 2023) and a new, challenging out-of-domain (OOD) set of 12th-century Latin letters (2025)
Score: 0.5076419064097734
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Translating a morphology-rich, low-resource language like Latin poses significant challenges. This paper introduces a reproducible draft-based refinement pipeline that elevates open-source Large Language Models (LLMs) to a performance level statistically comparable to top-tier proprietary systems. Our method first uses a fine-tuned NLLB-1.3B model to generate a high-quality, structurally faithful draft. A zero-shot LLM (Llama-3.3 or Qwen3) then polishes this draft, a process that can be further enhanced by augmenting the context with retrieved out-context examples (RAG). We demonstrate the robustness of this approach on two distinct benchmarks: a standard in-domain test set (Rosenthal, 2023) and a new, challenging out-of-domain (OOD) set of 12th-century Latin letters (2025). Our central finding is that this open-source RAG system achieves performance statistically comparable to the GPT-5 baseline, without any task-specific LLM fine-tuning. We release the pipeline, the Chartres OOD set, and evaluation scripts and models to facilitate replicability and further research.

Related papers

RAVEL: Reasoning Agents for Validating and Evaluating LLM Text Synthesis [78.32151470154422]
We introduce RAVEL, an agentic framework that enables the testers to autonomously plan and execute typical synthesis operations.<n>We present C3EBench, a benchmark comprising 1,258 samples derived from professional human writings.<n>By augmenting RAVEL with SOTA LLMs as operators, we find that such agentic text synthesis is dominated by the LLM's reasoning capability.
arXiv Detail & Related papers (2026-02-28T14:47:34Z)
Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets [2.0199251985015434]
We present a fully automated framework designed to enable scalable, high-quality translation of datasets and benchmarks.<n>We apply this approach to translate popular benchmarks and datasets into eight Eastern and Southern European languages.
arXiv Detail & Related papers (2026-02-25T18:58:25Z)
InvertiTune: High-Quality Data Synthesis for Cost-Effective Single-Shot Text-to-Knowledge Graph Generation [22.34499569359914]
InvertiTune is a framework that combines a controlled data generation pipeline with supervised fine-tuning.<n>We show that InvertiTune outperforms larger non-fine-tuned LLMs as well as state-of-the-art Text2KG approaches.
arXiv Detail & Related papers (2025-12-02T19:51:28Z)
InstructLR: A Scalable Approach to Create Instruction Dataset for Under-Resourced Languages [5.046479786355341]
In this paper, we introduce InstructLR, a novel framework designed to generate high-quality instruction datasets for low-resource languages (LRLs)<n>Our approach integrates LLM-driven text generation with a dual-layer quality filtering mechanism.<n>InstructLR has facilitated the creation of three multi-domain instruction benchmarks: ZarmaInstruct-50k, BambaraInstruct-50k, and FulfuldeInstruct-50k.
arXiv Detail & Related papers (2025-12-01T21:25:33Z)
LLM-guided Hierarchical Retrieval [54.73080745446999]
LATTICE is a hierarchical retrieval framework that enables an LLM to reason over and navigate large corpora with logarithmic search complexity.<n>A central challenge in such LLM-guided search is that the model's relevance judgments are noisy, context-dependent, and unaware of the hierarchy.<n>Our framework achieves state-of-the-art zero-shot performance on the reasoning-intensive BRIGHT benchmark.
arXiv Detail & Related papers (2025-10-15T07:05:17Z)
LLaSO: A Foundational Framework for Reproducible Research in Large Language and Speech Model [9.857195650438966]
We introduce LLaSO, the first fully open, end-to-end framework for large-scale speech-language modeling.<n>LLaSO provides the community with three essential resources: LLaSO-Align, a 12M-instance speech-text alignment corpus; LLaSO-Instruct, a 13.5M-instance multi-task instruction-tuning dataset; and LLaSO-Eval, a reproducible benchmark for standardized evaluation.<n>By releasing the complete stack of data, benchmarks, and models, LLaSO establishes a foundational open standard to unify research efforts and accelerate community-driven progress in LS
arXiv Detail & Related papers (2025-08-21T10:20:00Z)
SLRTP2025 Sign Language Production Challenge: Methodology, Results, and Future Work [87.9341538630949]
The first Sign Language Production Challenge was held as part of the third SLRTP Workshop at CVPR 2025.<n>The competition's aims are to evaluate architectures that translate from spoken language sentences to a sequence of skeleton poses.<n>This paper presents the challenge design and the winning methodologies.
arXiv Detail & Related papers (2025-08-09T11:57:33Z)
RustRepoTrans: Repository-level Code Translation Benchmark Targeting Rust [50.65321080814249]
RustRepoTrans is the first repository-level context code translation benchmark targeting incremental translation.<n>We evaluate seven representative LLMs, analyzing their errors to assess limitations in complex translation scenarios.
arXiv Detail & Related papers (2024-11-21T10:00:52Z)
Unleashing the Power of Large Language Models in Zero-shot Relation Extraction via Self-Prompting [21.04933334040135]
We introduce the Self-Prompting framework, a novel method designed to fully harness the embedded RE knowledge within Large Language Models.<n>Our framework employs a three-stage diversity approach to prompt LLMs, generating multiple synthetic samples that encapsulate specific relations from scratch.<n> Experimental evaluations on benchmark datasets show our approach outperforms existing LLM-based zero-shot RE methods.
arXiv Detail & Related papers (2024-10-02T01:12:54Z)
Large Language Model-guided Document Selection [23.673690115025913]
Large Language Model (LLM) pre-training exhausts an ever growing compute budget. Recent research has demonstrated that careful document selection enables comparable model quality with only a fraction of the FLOPs. We explore a promising direction for scalable general-domain document selection.
arXiv Detail & Related papers (2024-06-07T04:52:46Z)
YAYI 2: Multilingual Open-Source Large Language Models [53.92832054643197]
We propose YAYI 2, including both base and chat models, with 30 billion parameters. YAYI 2 is pre-trained from scratch on a multilingual corpus which contains 2.65 trillion tokens filtered by our pre-training data processing pipeline. The base model is aligned with human values through supervised fine-tuning with millions of instructions and reinforcement learning from human feedback.
arXiv Detail & Related papers (2023-12-22T17:34:47Z)
AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations [52.43593893122206]
Alignedcot is an in-context learning technique for invoking Large Language Models. It achieves consistent and correct step-wise prompts in zero-shot scenarios. We conduct experiments on mathematical reasoning and commonsense reasoning.
arXiv Detail & Related papers (2023-11-22T17:24:21Z)
A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models [27.777372498182864]
We propose a novel fine-tuning approach for Generative Large Language Models (LLMs) Our approach consists of two fine-tuning stages: initial fine-tuning on monolingual data followed by subsequent fine-tuning on a small set of high-quality parallel data. Based on LLaMA-2 as our underlying model, our results show that the model can achieve an average improvement of more than 12 BLEU and 12 COMET over its zero-shot performance.
arXiv Detail & Related papers (2023-09-20T22:53:15Z)
Text Classification via Large Language Models [63.1874290788797]
We introduce Clue And Reasoning Prompting (CARP) to address complex linguistic phenomena involved in text classification. Remarkably, CARP yields new SOTA performances on 4 out of 5 widely-used text-classification benchmarks. More importantly, we find that CARP delivers impressive abilities on low-resource and domain-adaptation setups.
arXiv Detail & Related papers (2023-05-15T06:24:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.