PENELOPIE: Enabling Open Information Extraction for the Greek Language
through Machine Translation
- URL: http://arxiv.org/abs/2103.15075v1
- Date: Sun, 28 Mar 2021 08:01:58 GMT
- Title: PENELOPIE: Enabling Open Information Extraction for the Greek Language
through Machine Translation
- Authors: Dimitris Papadopoulos, Nikolaos Papadakis and Nikolaos Matsatsinis
- Abstract summary: We present our submission for the EACL 2021 SRW; a methodology that aims at bridging the gap between high and low-resource languages.
We build Neural Machine Translation (NMT) models for English-to-Greek and Greek-to-English based on the Transformer architecture.
We leverage these NMT models to produce English translations of Greek text as input for our NLP pipeline, to which we apply a series of pre-processing and triple extraction tasks.
- Score: 0.30938904602244344
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper we present our submission for the EACL 2021 SRW; a methodology
that aims at bridging the gap between high and low-resource languages in the
context of Open Information Extraction, showcasing it on the Greek language.
The goals of this paper are twofold: First, we build Neural Machine Translation
(NMT) models for English-to-Greek and Greek-to-English based on the Transformer
architecture. Second, we leverage these NMT models to produce English
translations of Greek text as input for our NLP pipeline, to which we apply a
series of pre-processing and triple extraction tasks. Finally, we
back-translate the extracted triples to Greek. We conduct an evaluation of both
our NMT and OIE methods on benchmark datasets and demonstrate that our approach
outperforms the current state-of-the-art for the Greek natural language.
Related papers
- TasTe: Teaching Large Language Models to Translate through Self-Reflection [82.83958470745381]
Large language models (LLMs) have exhibited remarkable performance in various natural language processing tasks.
We propose the TasTe framework, which stands for translating through self-reflection.
The evaluation results in four language directions on the WMT22 benchmark reveal the effectiveness of our approach compared to existing methods.
arXiv Detail & Related papers (2024-06-12T17:21:21Z) - A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus [71.77214818319054]
Natural language inference is a proxy for natural language understanding.
There is no publicly available NLI corpus for the Romanian language.
We introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs.
arXiv Detail & Related papers (2024-05-20T08:41:15Z) - GreekBART: The First Pretrained Greek Sequence-to-Sequence Model [13.429669368275318]
We introduce GreekBART, the first Seq2Seq model based on BART-base architecture and pretrained on a large-scale Greek corpus.
We evaluate and compare GreekBART against BART-random, Greek-BERT, and XLM-R on a variety of discriminative tasks.
arXiv Detail & Related papers (2023-04-03T10:48:51Z) - The Effect of Normalization for Bi-directional Amharic-English Neural
Machine Translation [53.907805815477126]
This paper presents the first relatively large-scale Amharic-English parallel sentence dataset.
We build bi-directional Amharic-English translation models by fine-tuning the existing Facebook M2M100 pre-trained model.
The results show that the normalization of Amharic homophone characters increases the performance of Amharic-English machine translation in both directions.
arXiv Detail & Related papers (2022-10-27T07:18:53Z) - DivEMT: Neural Machine Translation Post-Editing Effort Across
Typologically Diverse Languages [5.367993194110256]
DivEMT is the first publicly available post-editing study of Neural Machine Translation (NMT) over a typologically diverse set of target languages.
We assess the impact on translation productivity of two state-of-the-art NMT systems, namely: Google Translate and the open-source multilingual model mBART50.
arXiv Detail & Related papers (2022-05-24T17:22:52Z) - Multi-granular Legal Topic Classification on Greek Legislation [4.09134848993518]
We study the task of classifying legal texts written in the Greek language.
This is the first time the task of Greek legal text classification is considered in an open research project.
arXiv Detail & Related papers (2021-09-30T17:43:00Z) - Source and Target Bidirectional Knowledge Distillation for End-to-end
Speech Translation [88.78138830698173]
We focus on sequence-level knowledge distillation (SeqKD) from external text-based NMT models.
We train a bilingual E2E-ST model to predict paraphrased transcriptions as an auxiliary task with a single decoder.
arXiv Detail & Related papers (2021-04-13T19:00:51Z) - SJTU-NICT's Supervised and Unsupervised Neural Machine Translation
Systems for the WMT20 News Translation Task [111.91077204077817]
We participated in four translation directions of three language pairs: English-Chinese, English-Polish, and German-Upper Sorbian.
Based on different conditions of language pairs, we have experimented with diverse neural machine translation (NMT) techniques.
In our submissions, the primary systems won the first place on English to Chinese, Polish to English, and German to Upper Sorbian translation directions.
arXiv Detail & Related papers (2020-10-11T00:40:05Z) - GREEK-BERT: The Greeks visiting Sesame Street [25.406207104603027]
Transformer-based language models, such as BERT, have achieved state-of-the-art performance in several downstream natural language processing tasks.
We present GREEK-BERT, a monolingual BERT-based language model for modern Greek.
arXiv Detail & Related papers (2020-08-27T09:36:14Z) - Data and Representation for Turkish Natural Language Inference [6.135815931215188]
We offer a positive response for natural language inference (NLI) in Turkish.
We translate two large English NLI datasets into Turkish and had a team of experts validate their translation quality and fidelity to the original labels.
We find that in-language embeddings are essential and that morphological parsing can be avoided where the training set is large.
arXiv Detail & Related papers (2020-04-30T17:12:52Z) - Learning Contextualized Sentence Representations for Document-Level
Neural Machine Translation [59.191079800436114]
Document-level machine translation incorporates inter-sentential dependencies into the translation of a source sentence.
We propose a new framework to model cross-sentence dependencies by training neural machine translation (NMT) to predict both the target translation and surrounding sentences of a source sentence.
arXiv Detail & Related papers (2020-03-30T03:38:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.