Building from Scratch: A Multi-Agent Framework with Human-in-the-Loop for Multilingual Legal Terminology Mapping
- URL: http://arxiv.org/abs/2512.12950v1
- Date: Mon, 15 Dec 2025 03:29:21 GMT
- Title: Building from Scratch: A Multi-Agent Framework with Human-in-the-Loop for Multilingual Legal Terminology Mapping
- Authors: Lingyi Meng, Maolin Liu, Hao Wang, Yilan Cheng, Qi Yang, Idlkaid Mohanmmed,
- Abstract summary: We propose a human-AI collaborative approach for building a multilingual legal terminology database.<n>This approach integrates advanced large language models and legal domain experts throughout the entire process.<n>Unlike a single automated pipeline, our approach places greater emphasis on how human experts participate.
- Score: 4.664349534386531
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Accurately mapping legal terminology across languages remains a significant challenge, especially for language pairs like Chinese and Japanese, which share a large number of homographs with different meanings. Existing resources and standardized tools for these languages are limited. To address this, we propose a human-AI collaborative approach for building a multilingual legal terminology database, based on a multi-agent framework. This approach integrates advanced large language models and legal domain experts throughout the entire process-from raw document preprocessing, article-level alignment, to terminology extraction, mapping, and quality assurance. Unlike a single automated pipeline, our approach places greater emphasis on how human experts participate in this multi-agent system. Humans and AI agents take on different roles: AI agents handle specific, repetitive tasks, such as OCR, text segmentation, semantic alignment, and initial terminology extraction, while human experts provide crucial oversight, review, and supervise the outputs with contextual knowledge and legal judgment. We tested the effectiveness of this framework using a trilingual parallel corpus comprising 35 key Chinese statutes, along with their English and Japanese translations. The experimental results show that this human-in-the-loop, multi-agent workflow not only improves the precision and consistency of multilingual legal terminology mapping but also offers greater scalability compared to traditional manual methods.
Related papers
- A Parallel Cross-Lingual Benchmark for Multimodal Idiomaticity Understanding [15.171586338601522]
Potentially idiomatic expressions (PIEs) construe meanings inherently tied to the everyday experience of a language community.<n>We present XMPIE, a parallel multilingual and multimodal dataset of potentially idiomatic expressions.
arXiv Detail & Related papers (2026-01-13T15:20:28Z) - Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings [0.0]
This research proposes a novel approach for translating puns from English to French by combining state-of-the-art large language models with specialized techniques for wordplay generation.<n>Our methodology's primary objective is to capture the linguistic creativity and humor of the source text wordplay, rather than simply duplicating its vocabulary.
arXiv Detail & Related papers (2025-07-09T03:09:14Z) - TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation [1.112686067941444]
TransLaw is a novel multi-agent framework implemented for real-world Hong Kong case law translation.<n>It employs three specialized agents, namely, Translator, Annotator, and Proofreader, to collaboratively produce translations for high accuracy in legal meaning.
arXiv Detail & Related papers (2025-07-01T15:39:26Z) - (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts [56.7988577327046]
We introduce TransAgents, a novel multi-agent framework that simulates the roles and collaborative practices of a human translation company.<n>Our findings highlight the potential of multi-agent collaboration in enhancing translation quality, particularly for longer texts.
arXiv Detail & Related papers (2024-05-20T05:55:08Z) - Advancing Multilingual Pre-training: TRIP Triangular Document-level
Pre-training for Multilingual Language Models [107.83158521848372]
We present textbfTriangular Document-level textbfPre-training (textbfTRIP), which is the first in the field to accelerate the conventional monolingual and bilingual objectives into a trilingual objective with a novel method called Grafting.
TRIP achieves several strong state-of-the-art (SOTA) scores on three multilingual document-level machine translation benchmarks and one cross-lingual abstractive summarization benchmark, including consistent improvements by up to 3.11 d-BLEU points and 8.9 ROUGE-L points.
arXiv Detail & Related papers (2022-12-15T12:14:25Z) - Learning to Answer Multilingual and Code-Mixed Questions [4.290420179006601]
Question-answering (QA) that comes naturally to humans is a critical component in seamless human-computer interaction.
Despite being one of the oldest research areas, the current QA system faces the critical challenge of handling multilingual queries.
This dissertation focuses on advancing QA techniques for handling end-user queries in multilingual environments.
arXiv Detail & Related papers (2022-11-14T16:49:58Z) - Multilingual Multimodality: A Taxonomical Survey of Datasets,
Techniques, Challenges and Opportunities [10.721189858694396]
We study the unification of multilingual and multimodal (MultiX) streams.
We review the languages studied, gold or silver data with parallel annotations, and understand how these modalities and languages interact in modeling.
We present an account of the modeling approaches along with their strengths and weaknesses to better understand what scenarios they can be used reliably.
arXiv Detail & Related papers (2022-10-30T21:46:01Z) - Revamping Multilingual Agreement Bidirectionally via Switched
Back-translation for Multilingual Neural Machine Translation [107.83158521848372]
multilingual agreement (MA) has shown its importance for multilingual neural machine translation (MNMT)
We present textbfBidirectional textbfMultilingual textbfAgreement via textbfSwitched textbfBack-textbftranslation (textbfBMA-SBT)
It is a novel and universal multilingual agreement framework for fine-tuning pre-trained MNMT models.
arXiv Detail & Related papers (2022-09-28T09:14:58Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Crossing the Conversational Chasm: A Primer on Multilingual
Task-Oriented Dialogue Systems [51.328224222640614]
Current state-of-the-art ToD models based on large pretrained neural language models are data hungry.
Data acquisition for ToD use cases is expensive and tedious.
arXiv Detail & Related papers (2021-04-17T15:19:56Z) - Evaluating Multilingual Text Encoders for Unsupervised Cross-Lingual
Retrieval [51.60862829942932]
We present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
For sentence-level CLIR, we demonstrate that state-of-the-art performance can be achieved.
However, the peak performance is not met using the general-purpose multilingual text encoders off-the-shelf', but rather relying on their variants that have been further specialized for sentence understanding tasks.
arXiv Detail & Related papers (2021-01-21T00:15:38Z) - XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating
Cross-lingual Generalization [128.37244072182506]
Cross-lingual TRansfer Evaluation of Multilinguals XTREME is a benchmark for evaluating the cross-lingual generalization capabilities of multilingual representations across 40 languages and 9 tasks.
We demonstrate that while models tested on English reach human performance on many tasks, there is still a sizable gap in the performance of cross-lingually transferred models.
arXiv Detail & Related papers (2020-03-24T19:09:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.