TransBench: Benchmarking Machine Translation for Industrial-Scale Applications
- URL: http://arxiv.org/abs/2505.14244v1
- Date: Tue, 20 May 2025 11:54:58 GMT
- Title: TransBench: Benchmarking Machine Translation for Industrial-Scale Applications
- Authors: Haijun Li, Tianqi Shi, Zifu Shang, Yuxuan Han, Xueyu Zhao, Hao Wang, Yu Qian, Zhiqiang Qian, Linlong Xu, Minghao Wu, Chenyang Lyu, Longyue Wang, Gongbo Tang, Weihua Luo, Zhao Xu, Kaifu Zhang,
- Abstract summary: Machine translation (MT) has become indispensable for cross-border communication in globalized industries like e-commerce, finance, and legal services.<n>Applying general-purpose MT models to industrial scenarios reveals critical limitations due to domain-specific terminology, cultural nuances, and stylistic conventions absent in generic benchmarks.<n>Existing evaluation frameworks inadequately assess translation in specialized contexts, creating a gap between academic benchmarks and real-world efficacy.<n>We introduce TransBench, a benchmark for industrial MT, initially targeting international e-commerce with 17,000 sentences spanning 4 main scenarios and 33 language pairs.
- Score: 39.03233118476432
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Machine translation (MT) has become indispensable for cross-border communication in globalized industries like e-commerce, finance, and legal services, with recent advancements in large language models (LLMs) significantly enhancing translation quality. However, applying general-purpose MT models to industrial scenarios reveals critical limitations due to domain-specific terminology, cultural nuances, and stylistic conventions absent in generic benchmarks. Existing evaluation frameworks inadequately assess performance in specialized contexts, creating a gap between academic benchmarks and real-world efficacy. To address this, we propose a three-level translation capability framework: (1) Basic Linguistic Competence, (2) Domain-Specific Proficiency, and (3) Cultural Adaptation, emphasizing the need for holistic evaluation across these dimensions. We introduce TransBench, a benchmark tailored for industrial MT, initially targeting international e-commerce with 17,000 professionally translated sentences spanning 4 main scenarios and 33 language pairs. TransBench integrates traditional metrics (BLEU, TER) with Marco-MOS, a domain-specific evaluation model, and provides guidelines for reproducible benchmark construction. Our contributions include: (1) a structured framework for industrial MT evaluation, (2) the first publicly available benchmark for e-commerce translation, (3) novel metrics probing multi-level translation quality, and (4) open-sourced evaluation tools. This work bridges the evaluation gap, enabling researchers and practitioners to systematically assess and enhance MT systems for industry-specific needs.
Related papers
- MT$^{3}$: Scaling MLLM-based Text Image Machine Translation via Multi-Task Reinforcement Learning [22.27715186895943]
We introduce MT$3$, the first framework to apply Multi-Task RL to MLLMs for end-to-end TIMT.<n>It is trained using a novel multi-mixed reward mechanism that adapts rule-based RL strategies to TIMT's intricacies.<n>Our model achieves state-of-the-art results on the latest in-domain MIT-10M benchmark.
arXiv Detail & Related papers (2025-05-26T09:02:35Z) - Team ACK at SemEval-2025 Task 2: Beyond Word-for-Word Machine Translation for English-Korean Pairs [23.19401079530962]
Translating knowledge-intensive and entity-rich text between English and Korean requires transcreation to preserve language-specific and cultural nuances.<n>We evaluate 13 models (LLMs and MT models) using automatic metrics and human assessment by bilingual annotators.
arXiv Detail & Related papers (2025-04-29T05:58:19Z) - Translation Analytics for Freelancers: I. Introduction, Data Preparation, Baseline Evaluations [0.0]
This is the first in a series of papers exploring the rapidly expanding new opportunities arising from recent progress in language technologies.<n>We aim to empower translators with actionable methods to harness these advancements.
arXiv Detail & Related papers (2025-04-20T13:54:28Z) - Redefining Machine Translation on Social Network Services with Large Language Models [35.519703688810786]
This paper introduces RedTrans, a 72B LLM tailored for SNS translation.<n>RedTrans is trained on a novel dataset developed through three innovations.<n> Experiments show RedTrans outperforms state-of-the-art LLMs.
arXiv Detail & Related papers (2025-04-10T16:24:28Z) - Translating Multimodal AI into Real-World Inspection: TEMAI Evaluation Framework and Pathways for Implementation [3.848879161330863]
This paper introduces the Translational Evaluation of Multimodal AI for Inspection framework.<n>It bridges multimodal AI capabilities with industrial inspection implementation.<n>The framework demonstrates that technical capability alone yields limited value without corresponding adoption mechanisms.
arXiv Detail & Related papers (2025-03-31T11:30:56Z) - MME-Industry: A Cross-Industry Multimodal Evaluation Benchmark [20.642661835794975]
We introduce MME-Industry, a novel benchmark designed specifically for evaluating MLLMs in industrial settings.<n>The benchmark encompasses 21 distinct domain, comprising 1050 question-answer pairs with 50 questions per domain.<n>We provide both Chinese and English versions of the benchmark, enabling comparative analysis of MLLMs' capabilities across these languages.
arXiv Detail & Related papers (2025-01-28T03:56:17Z) - Discourse Centric Evaluation of Machine Translation with a Densely
Annotated Parallel Corpus [82.07304301996562]
This paper presents a new dataset with rich discourse annotations, built upon the large-scale parallel corpus BWB introduced in Jiang et al.
We investigate the similarities and differences between the discourse structures of source and target languages.
We discover that MT outputs differ fundamentally from human translations in terms of their latent discourse structures.
arXiv Detail & Related papers (2023-05-18T17:36:41Z) - Document-Level Machine Translation with Large Language Models [91.03359121149595]
Large language models (LLMs) can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks.
This paper provides an in-depth evaluation of LLMs' ability on discourse modeling.
arXiv Detail & Related papers (2023-04-05T03:49:06Z) - UniTE: Unified Translation Evaluation [63.58868113074476]
UniTE is the first unified framework engaged with abilities to handle all three evaluation tasks.
We testify our framework on WMT 2019 Metrics and WMT 2020 Quality Estimation benchmarks.
arXiv Detail & Related papers (2022-04-28T08:35:26Z) - On the Limitations of Cross-lingual Encoders as Exposed by
Reference-Free Machine Translation Evaluation [55.02832094101173]
Evaluation of cross-lingual encoders is usually performed either via zero-shot cross-lingual transfer in supervised downstream tasks or via unsupervised cross-lingual similarity.
This paper concerns ourselves with reference-free machine translation (MT) evaluation where we directly compare source texts to (sometimes low-quality) system translations.
We systematically investigate a range of metrics based on state-of-the-art cross-lingual semantic representations obtained with pretrained M-BERT and LASER.
We find that they perform poorly as semantic encoders for reference-free MT evaluation and identify their two key limitations.
arXiv Detail & Related papers (2020-05-03T22:10:23Z) - Towards Making the Most of Context in Neural Machine Translation [112.9845226123306]
We argue that previous research did not make a clear use of the global context.
We propose a new document-level NMT framework that deliberately models the local context of each sentence.
arXiv Detail & Related papers (2020-02-19T03:30:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.