Extracting Synonyms from Bilingual Dictionaries
- URL: http://arxiv.org/abs/2012.00600v1
- Date: Tue, 1 Dec 2020 16:09:22 GMT
- Title: Extracting Synonyms from Bilingual Dictionaries
- Authors: Mustafa Jarrar, Eman Karajah, Muhammad Khalifa, Khaled Shaalan
- Abstract summary: We present our progress in developing a novel algorithm to extract synonyms from bilingual dictionaries.
The idea is to construct a translation graph from translation pairs, then to extract and consolidate cyclic paths to form bilingual sets of synonyms.
The initial evaluation of this algorithm illustrates promising results in extracting Arabic-English bilingual synonyms.
- Score: 1.1470070927586016
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We present our progress in developing a novel algorithm to extract synonyms
from bilingual dictionaries. Identification and usage of synonyms play a
significant role in improving the performance of information access
applications. The idea is to construct a translation graph from translation
pairs, then to extract and consolidate cyclic paths to form bilingual sets of
synonyms. The initial evaluation of this algorithm illustrates promising
results in extracting Arabic-English bilingual synonyms. In the evaluation, we
first converted the synsets in the Arabic WordNet into translation pairs (i.e.,
losing word-sense memberships). Next, we applied our algorithm to rebuild these
synsets. We compared the original and extracted synsets obtaining an F-Measure
of 82.3% and 82.1% for Arabic and English synsets extraction, respectively.
Related papers
- ParaAMR: A Large-Scale Syntactically Diverse Paraphrase Dataset by AMR
Back-Translation [59.91139600152296]
ParaAMR is a large-scale syntactically diverse paraphrase dataset created by abstract meaning representation back-translation.
We show that ParaAMR can be used to improve on three NLP tasks: learning sentence embeddings, syntactically controlled paraphrase generation, and data augmentation for few-shot learning.
arXiv Detail & Related papers (2023-05-26T02:27:33Z) - CompoundPiece: Evaluating and Improving Decompounding Performance of
Language Models [77.45934004406283]
We systematically study decompounding, the task of splitting compound words into their constituents.
We introduce a dataset of 255k compound and non-compound words across 56 diverse languages obtained from Wiktionary.
We introduce a novel methodology to train dedicated models for decompounding.
arXiv Detail & Related papers (2023-05-23T16:32:27Z) - Biomedical Named Entity Recognition via Dictionary-based Synonym
Generalization [51.89486520806639]
We propose a novel Synonym Generalization (SynGen) framework that recognizes the biomedical concepts contained in the input text using span-based predictions.
We extensively evaluate our approach on a wide range of benchmarks and the results verify that SynGen outperforms previous dictionary-based models by notable margins.
arXiv Detail & Related papers (2023-05-22T14:36:32Z) - A Benchmark and Scoring Algorithm for Enriching Arabic Synonyms [0.0]
Given a mono/multilingual synset and a threshold (a fuzzy value [0-1]), our goal is to extract new synonyms above this threshold from existing lexicons.
The dataset consists of 3K candidate synonyms for 500 synsets.
Our evaluations show that the algorithm behaves like a linguist and its fuzzy values are close to those proposed by linguists.
arXiv Detail & Related papers (2023-02-04T20:30:32Z) - Current Trends and Approaches in Synonyms Extraction: Potential
Adaptation to Arabic [0.0]
The paper presents a survey of the different approaches and trends used in automatically extracting the synonyms.
The first approach is to find the Synonyms using a translation graph.
The second approach is to discover new transition pairs such as (Arabic-English) (English-France) then (Arabic-France)
arXiv Detail & Related papers (2022-05-20T19:05:10Z) - Extracting and filtering paraphrases by bridging natural language
inference and paraphrasing [0.0]
We propose a novel methodology for the extraction of paraphrasing datasets from NLI datasets and cleaning existing paraphrasing datasets.
The results show high quality of extracted paraphrasing datasets and surprisingly high noise levels in two existing paraphrasing datasets.
arXiv Detail & Related papers (2021-11-13T14:06:37Z) - Interval Probabilistic Fuzzy WordNet [8.396691008449704]
We present an algorithm for constructing the Interval Probabilistic Fuzzy (IPF) synsets in any language.
We constructed and published the IPF synsets of WordNet for English language.
arXiv Detail & Related papers (2021-04-04T17:28:37Z) - Enhanced word embeddings using multi-semantic representation through
lexical chains [1.8199326045904998]
We propose two novel algorithms, called Flexible Lexical Chain II and Fixed Lexical Chain II.
These algorithms combine the semantic relations derived from lexical chains, prior knowledge from lexical databases, and the robustness of the distributional hypothesis in word embeddings as building blocks forming a single system.
Our results show the integration between lexical chains and word embeddings representations sustain state-of-the-art results, even against more complex systems.
arXiv Detail & Related papers (2021-01-22T09:43:33Z) - Syntactic representation learning for neural network based TTS with
syntactic parse tree traversal [49.05471750563229]
We propose a syntactic representation learning method based on syntactic parse tree to automatically utilize the syntactic structure information.
Experimental results demonstrate the effectiveness of our proposed approach.
For sentences with multiple syntactic parse trees, prosodic differences can be clearly perceived from the synthesized speeches.
arXiv Detail & Related papers (2020-12-13T05:52:07Z) - Syntactic Structure Distillation Pretraining For Bidirectional Encoders [49.483357228441434]
We introduce a knowledge distillation strategy for injecting syntactic biases into BERT pretraining.
We distill the approximate marginal distribution over words in context from the syntactic LM.
Our findings demonstrate the benefits of syntactic biases, even in representation learners that exploit large amounts of data.
arXiv Detail & Related papers (2020-05-27T16:44:01Z) - Learning Coupled Policies for Simultaneous Machine Translation using
Imitation Learning [85.70547744787]
We present an approach to efficiently learn a simultaneous translation model with coupled programmer-interpreter policies.
Experiments on six language-pairs show our method outperforms strong baselines in terms of translation quality.
arXiv Detail & Related papers (2020-02-11T10:56:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.