Towards Terminology Management Automation for Arabic
- URL: http://arxiv.org/abs/2503.19211v1
- Date: Mon, 24 Mar 2025 23:35:00 GMT
- Title: Towards Terminology Management Automation for Arabic
- Authors: Mahdi Nasser, Laura Sayyah, Fadi A. Zaraket,
- Abstract summary: This paper presents a method and supporting tools for automation of terminology management for Arabic.<n>The tools extract lists of parallel terminology matching terms in foreign languages to their Arabic counterparts from field specific texts.<n>This has significant implications as it can be used to improve consistent translation and use of terms in specialized Arabic academic books.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper presents a method and supporting tools for automation of terminology management for Arabic. The tools extract lists of parallel terminology matching terms in foreign languages to their Arabic counterparts from field specific texts. This has significant implications as it can be used to improve consistent translation and use of terms in specialized Arabic academic books, and provides automated aid for enhancing cross lingual text processing. This automation of terminology management aims to reduce processing time, and ensure use of consistent and correct terminology. The extraction takes advantage of naturally occurring term translations. It considers several candidate phrases of varying lengths that co-occur next to the foreign terms. Then it computes several similarity metrics, including lexicographic, phonetic, morphological, and semantic ones to decide the problem. We experiment with heuristic, machine learning, and ML with post processing approaches. This paper reports on a novel curated dataset for the task, an existing expert reviewed industry parallel corpora, and on the performance of the three approaches. The best approach achieved 94.9% precision and 92.4% recall.
Related papers
- Efficient Terminology Integration for LLM-based Translation in Specialized Domains [0.0]
In specialized fields such as patent, finance, or biomedical domains, terminology is crucial for translation.
We introduce a methodology that efficiently trains models with a smaller amount of data while preserving the accuracy of terminology translation.
This methodology enhances the model's ability to handle specialized terminology and ensures high-quality translations.
arXiv Detail & Related papers (2024-10-21T07:01:25Z) - Issue Report Validation in an Industrial Context [1.993607565985189]
We work on 1,200 randomly selected issue reports in banking domain, written in Turkish.
We manually label these reports for validity, and extract the relevant patterns indicating that they are invalid.
Using the proposed feature extractors, we utilize a machine learning based approach to predict the issue reports' validity, performing a 0.77 F1-score.
arXiv Detail & Related papers (2023-11-29T14:24:13Z) - NSOAMT -- New Search Only Approach to Machine Translation [0.0]
A "new search only approach to machine translation" was adopted to tackle some of the slowness and inaccuracy of the other technologies.
The idea is to develop a solution that, by indexing an incremental set of words that combine a certain semantic meaning, makes it possible to create a process of correspondence between their native language record and the language of translation.
arXiv Detail & Related papers (2023-09-19T11:12:21Z) - Decomposed Prompting for Machine Translation Between Related Languages
using Large Language Models [55.35106713257871]
We introduce DecoMT, a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations.
We show that DecoMT outperforms the strong few-shot prompting BLOOM model with an average improvement of 8 chrF++ scores across the examined languages.
arXiv Detail & Related papers (2023-05-22T14:52:47Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Using Document Similarity Methods to create Parallel Datasets for Code
Translation [60.36392618065203]
Translating source code from one programming language to another is a critical, time-consuming task.
We propose to use document similarity methods to create noisy parallel datasets of code.
We show that these models perform comparably to models trained on ground truth for reasonable levels of noise.
arXiv Detail & Related papers (2021-10-11T17:07:58Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z) - Facilitating Terminology Translation with Target Lemma Annotations [4.492630871726495]
We train machine translation systems using a source-side data augmentation method that annotates randomly selected source language words with their target language lemmas.
Experiments on terminology translation into the morphologically complex Baltic and Uralic languages show an improvement of up to 7 BLEU points over baseline systems.
Results of the human evaluation indicate a 47.7% absolute improvement over the previous work in term translation accuracy when translating into Latvian.
arXiv Detail & Related papers (2021-01-25T12:07:20Z) - Automatic Extraction of Rules Governing Morphological Agreement [103.78033184221373]
We develop an automated framework for extracting a first-pass grammatical specification from raw text.
We focus on extracting rules describing agreement, a morphosyntactic phenomenon at the core of the grammars of many of the world's languages.
We apply our framework to all languages included in the Universal Dependencies project, with promising results.
arXiv Detail & Related papers (2020-10-02T18:31:45Z) - Generative latent neural models for automatic word alignment [0.0]
Variational autoencoders have been recently used in various of natural language processing to learn in an unsupervised way latent representations that are useful for language generation tasks.
In this paper, we study these models for the task of word alignment and propose and assess several evolutions of a vanilla variational autoencoders.
We demonstrate that these techniques can yield competitive results as compared to Giza++ and to a strong neural network alignment system for two language pairs.
arXiv Detail & Related papers (2020-09-28T07:54:09Z) - Learning Coupled Policies for Simultaneous Machine Translation using
Imitation Learning [85.70547744787]
We present an approach to efficiently learn a simultaneous translation model with coupled programmer-interpreter policies.
Experiments on six language-pairs show our method outperforms strong baselines in terms of translation quality.
arXiv Detail & Related papers (2020-02-11T10:56:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.