English to Arabic machine translation of mathematical documents
- URL: http://arxiv.org/abs/2312.03753v1
- Date: Sat, 2 Dec 2023 21:02:07 GMT
- Title: English to Arabic machine translation of mathematical documents
- Authors: Mustapha Eddahibi and Mohammed Mensouri
- Abstract summary: This paper focuses on translating English LATEX mathematical documents into Arabic LATEX.
The proposed system leverages a Transformer model as the core of the translation system.
The integration of RyDArab, an Arabic mathematical TEX extension, along with a rule-based translator for Arabic mathematical expressions, contributes to the precise rendering of complex mathematical symbols and equations in the translated output.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper is about the development of a machine translation system tailored
specifically for LATEX mathematical documents. The system focuses on
translating English LATEX mathematical documents into Arabic LATEX, catering to
the growing demand for multilingual accessibility in scientific and
mathematical literature. With the vast proliferation of LATEX mathematical
documents the need for an efficient and accurate translation system has become
increasingly essential. This paper addresses the necessity for a robust
translation tool that enables seamless communication and comprehension of
complex mathematical content across language barriers. The proposed system
leverages a Transformer model as the core of the translation system, ensuring
enhanced accuracy and fluency in the translated Arabic LATEX documents.
Furthermore, the integration of RyDArab, an Arabic mathematical TEX extension,
along with a rule-based translator for Arabic mathematical expressions,
contributes to the precise rendering of complex mathematical symbols and
equations in the translated output. The paper discusses the architecture,
methodology, of the developed system, highlighting its efficacy in bridging the
language gap in the domain of mathematical documentation
Related papers
- LexMatcher: Dictionary-centric Data Collection for LLM-based Machine Translation [67.24113079928668]
We present LexMatcher, a method for data curation driven by the coverage of senses found in bilingual dictionaries.
Our approach outperforms the established baselines on the WMT2022 test sets.
arXiv Detail & Related papers (2024-06-03T15:30:36Z) - Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale
Pretraining Corpus for Math [52.66190891388847]
We introduce textscMathPile, a diverse and high-quality math-centric corpus comprising about 9.5 billion tokens.
Our meticulous data collection and processing efforts included a complex suite of preprocessing.
We hope our textscMathPile can help to enhance the mathematical reasoning abilities of language models.
arXiv Detail & Related papers (2023-12-28T16:55:40Z) - Document-Level Language Models for Machine Translation [37.106125892770315]
We build context-aware translation systems utilizing document-level monolingual data instead.
We improve existing approaches by leveraging recent advancements in model combination.
In most scenarios, back-translation gives even better results, at the cost of having to re-train the translation system.
arXiv Detail & Related papers (2023-10-18T20:10:07Z) - Neural Machine Translation for Mathematical Formulae [8.608288231153304]
We tackle the problem of neural machine translation of mathematical formulae between ambiguous presentation languages and unambiguous content languages.
We find that convolutional sequence-to-sequence networks achieve 95.1% and 90.7% exact matches, respectively.
arXiv Detail & Related papers (2023-05-25T19:15:06Z) - Machine Translation for Accessible Multi-Language Text Analysis [1.5484595752241124]
We show that English-trained measures computed after translation to English have adequate-to-excellent accuracy.
We show this for three major analytics -- sentiment analysis, topic analysis, and word embeddings -- over 16 languages.
arXiv Detail & Related papers (2023-01-20T04:11:38Z) - The Effect of Normalization for Bi-directional Amharic-English Neural
Machine Translation [53.907805815477126]
This paper presents the first relatively large-scale Amharic-English parallel sentence dataset.
We build bi-directional Amharic-English translation models by fine-tuning the existing Facebook M2M100 pre-trained model.
The results show that the normalization of Amharic homophone characters increases the performance of Amharic-English machine translation in both directions.
arXiv Detail & Related papers (2022-10-27T07:18:53Z) - A Bilingual Parallel Corpus with Discourse Annotations [82.07304301996562]
This paper describes BWB, a large parallel corpus first introduced in Jiang et al. (2022), along with an annotated test set.
The BWB corpus consists of Chinese novels translated by experts into English, and the annotated test set is designed to probe the ability of machine translation systems to model various discourse phenomena.
arXiv Detail & Related papers (2022-10-26T12:33:53Z) - JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem
Understanding [74.12405417718054]
This paper aims to advance the mathematical intelligence of machines by presenting the first Chinese mathematical pre-trained language model(PLM)
Unlike other standard NLP tasks, mathematical texts are difficult to understand, since they involve mathematical terminology, symbols and formulas in the problem statement.
We design a novel curriculum pre-training approach for improving the learning of mathematical PLMs, consisting of both basic and advanced courses.
arXiv Detail & Related papers (2022-06-13T17:03:52Z) - On the Influence of Machine Translation on Language Origin Obfuscation [0.3437656066916039]
We analyze the ability to detect the source language from the translated output of two widely used commercial machine translation systems.
Evaluations show that the source language can be reconstructed with high accuracy for documents that contain a sufficient amount of translated text.
arXiv Detail & Related papers (2021-06-24T08:33:24Z) - Machine Translation of Mathematical Text [0.0]
We have implemented a machine translation system, the PolyMath Translator, for documents containing mathematical text.
The current implementation translates English to French, attaining a BLEU score of 53.5 on a held-out test corpus of mathematical sentences.
It produces documents that can be compiled to PDF without further editing.
arXiv Detail & Related papers (2020-10-11T11:59:40Z) - A High-Quality Multilingual Dataset for Structured Documentation
Translation [101.41835967142521]
This paper presents a high-quality multilingual dataset for the documentation domain.
We collect XML-structured parallel text segments from the online documentation for an enterprise software platform.
arXiv Detail & Related papers (2020-06-24T02:08:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.