Related papers: English to Arabic machine translation of mathematical documents

English to Arabic machine translation of mathematical documents

URL: http://arxiv.org/abs/2312.03753v1
Date: Sat, 2 Dec 2023 21:02:07 GMT
Title: English to Arabic machine translation of mathematical documents
Authors: Mustapha Eddahibi and Mohammed Mensouri
Abstract summary: This paper focuses on translating English LATEX mathematical documents into Arabic LATEX. The proposed system leverages a Transformer model as the core of the translation system. The integration of RyDArab, an Arabic mathematical TEX extension, along with a rule-based translator for Arabic mathematical expressions, contributes to the precise rendering of complex mathematical symbols and equations in the translated output.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper is about the development of a machine translation system tailored specifically for LATEX mathematical documents. The system focuses on translating English LATEX mathematical documents into Arabic LATEX, catering to the growing demand for multilingual accessibility in scientific and mathematical literature. With the vast proliferation of LATEX mathematical documents the need for an efficient and accurate translation system has become increasingly essential. This paper addresses the necessity for a robust translation tool that enables seamless communication and comprehension of complex mathematical content across language barriers. The proposed system leverages a Transformer model as the core of the translation system, ensuring enhanced accuracy and fluency in the translated Arabic LATEX documents. Furthermore, the integration of RyDArab, an Arabic mathematical TEX extension, along with a rule-based translator for Arabic mathematical expressions, contributes to the precise rendering of complex mathematical symbols and equations in the translated output. The paper discusses the architecture, methodology, of the developed system, highlighting its efficacy in bridging the language gap in the domain of mathematical documentation

Related papers

MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model Training [7.164697875838552]
This study focuses on the development of specialized training datasets to enhance the encoding of mathematical content. We introduce Math Mutator (MAMUT), a framework capable of generating equivalent and falsified versions of a given mathematical formula in notation. Based on MAMUT, we have generated four large mathematical datasets containing diverse notation, which can be used to train language models with enhanced mathematical embeddings.
arXiv Detail & Related papers (2025-02-28T08:53:42Z)
Edit Once, Update Everywhere: A Simple Framework for Cross-Lingual Knowledge Synchronization in LLMs [60.12222055772508]
We present a simple and practical state-of-the-art (SOTA) recipe Cross-Lingual Knowledge Democracy Edit (X-KDE) X-KDE is designed to propagate knowledge from a dominant language to other languages effectively. Experiments on the Bi-ZsRE and MzsRE benchmarks show that X-KDE significantly enhances cross-lingual performance.
arXiv Detail & Related papers (2025-02-20T15:32:31Z)
STEM-POM: Evaluating Language Models Math-Symbol Reasoning in Document Parsing [2.2315518704035595]
We introduce STEM-PoM, a benchmark dataset to evaluate large language models' reasoning abilities on math symbols. The dataset contains over 2K math symbols classified as main attributes of variables, constants, operators, and unit descriptors. Our experiments show that state-of-the-art LLMs achieve an average of 20-60% accuracy under in-context learning and 50-60% accuracy with fine-tuning.
arXiv Detail & Related papers (2024-11-01T06:25:06Z)
LexMatcher: Dictionary-centric Data Collection for LLM-based Machine Translation [67.24113079928668]
We present LexMatcher, a method for data curation driven by the coverage of senses found in bilingual dictionaries. Our approach outperforms the established baselines on the WMT2022 test sets.
arXiv Detail & Related papers (2024-06-03T15:30:36Z)
EMMA-X: An EM-like Multilingual Pre-training Algorithm for Cross-lingual Representation Learning [74.60554112841307]
We propose EMMAX: an EM-like Multilingual pretraining algorithm to learn (X)Crosslingual universals. EMMAX unifies cross-lingual representation learning task and an extra semantic relation prediction task within an EM framework.
arXiv Detail & Related papers (2023-10-26T08:31:00Z)
Document-Level Language Models for Machine Translation [37.106125892770315]
We build context-aware translation systems utilizing document-level monolingual data instead. We improve existing approaches by leveraging recent advancements in model combination. In most scenarios, back-translation gives even better results, at the cost of having to re-train the translation system.
arXiv Detail & Related papers (2023-10-18T20:10:07Z)
Neural Machine Translation for Mathematical Formulae [8.608288231153304]
We tackle the problem of neural machine translation of mathematical formulae between ambiguous presentation languages and unambiguous content languages. We find that convolutional sequence-to-sequence networks achieve 95.1% and 90.7% exact matches, respectively.
arXiv Detail & Related papers (2023-05-25T19:15:06Z)
The Effect of Normalization for Bi-directional Amharic-English Neural Machine Translation [53.907805815477126]
This paper presents the first relatively large-scale Amharic-English parallel sentence dataset. We build bi-directional Amharic-English translation models by fine-tuning the existing Facebook M2M100 pre-trained model. The results show that the normalization of Amharic homophone characters increases the performance of Amharic-English machine translation in both directions.
arXiv Detail & Related papers (2022-10-27T07:18:53Z)
A Bilingual Parallel Corpus with Discourse Annotations [82.07304301996562]
This paper describes BWB, a large parallel corpus first introduced in Jiang et al. (2022), along with an annotated test set. The BWB corpus consists of Chinese novels translated by experts into English, and the annotated test set is designed to probe the ability of machine translation systems to model various discourse phenomena.
arXiv Detail & Related papers (2022-10-26T12:33:53Z)
JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding [74.12405417718054]
This paper aims to advance the mathematical intelligence of machines by presenting the first Chinese mathematical pre-trained language model(PLM) Unlike other standard NLP tasks, mathematical texts are difficult to understand, since they involve mathematical terminology, symbols and formulas in the problem statement. We design a novel curriculum pre-training approach for improving the learning of mathematical PLMs, consisting of both basic and advanced courses.
arXiv Detail & Related papers (2022-06-13T17:03:52Z)
On the Influence of Machine Translation on Language Origin Obfuscation [0.3437656066916039]
We analyze the ability to detect the source language from the translated output of two widely used commercial machine translation systems. Evaluations show that the source language can be reconstructed with high accuracy for documents that contain a sufficient amount of translated text.
arXiv Detail & Related papers (2021-06-24T08:33:24Z)
Machine Translation of Mathematical Text [0.0]
We have implemented a machine translation system, the PolyMath Translator, for documents containing mathematical text. The current implementation translates English to French, attaining a BLEU score of 53.5 on a held-out test corpus of mathematical sentences. It produces documents that can be compiled to PDF without further editing.
arXiv Detail & Related papers (2020-10-11T11:59:40Z)
A High-Quality Multilingual Dataset for Structured Documentation Translation [101.41835967142521]
This paper presents a high-quality multilingual dataset for the documentation domain. We collect XML-structured parallel text segments from the online documentation for an enterprise software platform.
arXiv Detail & Related papers (2020-06-24T02:08:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.