Multi-granular Legal Topic Classification on Greek Legislation
- URL: http://arxiv.org/abs/2109.15298v1
- Date: Thu, 30 Sep 2021 17:43:00 GMT
- Title: Multi-granular Legal Topic Classification on Greek Legislation
- Authors: Christos Papaloukas, Ilias Chalkidis, Konstantinos Athinaios,
Despina-Athanasia Pantazi, Manolis Koubarakis
- Abstract summary: We study the task of classifying legal texts written in the Greek language.
This is the first time the task of Greek legal text classification is considered in an open research project.
- Score: 4.09134848993518
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we study the task of classifying legal texts written in the
Greek language. We introduce and make publicly available a novel dataset based
on Greek legislation, consisting of more than 47 thousand official, categorized
Greek legislation resources. We experiment with this dataset and evaluate a
battery of advanced methods and classifiers, ranging from traditional machine
learning and RNN-based methods to state-of-the-art Transformer-based methods.
We show that recurrent architectures with domain-specific word embeddings offer
improved overall performance while being competitive even to transformer-based
models. Finally, we show that cutting-edge multilingual and monolingual
transformer-based models brawl on the top of the classifiers' ranking, making
us question the necessity of training monolingual transfer learning models as a
rule of thumb. To the best of our knowledge, this is the first time the task of
Greek legal text classification is considered in an open research project,
while also Greek is a language with very limited NLP resources in general.
Related papers
- The Large Language Model GreekLegalRoBERTa [2.4797200957733576]
We develop four versions of GreekLegalRoBERTa, which are four large language models trained on Greek legal and nonlegal text.
We show that our models surpass the performance of GreekLegalBERT, Greek- LegalBERT-v2, and GreekBERT in two tasks involving Greek legal documents.
arXiv Detail & Related papers (2024-10-10T20:54:39Z) - A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus [71.77214818319054]
Natural language inference is a proxy for natural language understanding.
There is no publicly available NLI corpus for the Romanian language.
We introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs.
arXiv Detail & Related papers (2024-05-20T08:41:15Z) - GreekT5: A Series of Greek Sequence-to-Sequence Models for News
Summarization [0.0]
This paper proposes a series of novel TS models for Greek news articles.
The proposed models were thoroughly evaluated on the same dataset against GreekBART.
Our evaluation results reveal that most of the proposed models significantly outperform GreekBART on various evaluation metrics.
arXiv Detail & Related papers (2023-11-13T21:33:12Z) - OYXOY: A Modern NLP Test Suite for Modern Greek [2.059776592203642]
This paper serves as a foundational step towards the development of a linguistically motivated evaluation suite for Greek NLP.
We introduce four expert-verified evaluation tasks, specifically targeted at natural language inference, word sense disambiguation and metaphor detection.
More than language-resourced replicas of existing tasks, we contribute two innovations which will resonate with the broader resource and evaluation community.
arXiv Detail & Related papers (2023-09-13T15:00:56Z) - T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text
Classification [50.675552118811]
Cross-lingual text classification is typically built on large-scale, multilingual language models (LMs) pretrained on a variety of languages of interest.
We propose revisiting the classic "translate-and-test" pipeline to neatly separate the translation and classification stages.
arXiv Detail & Related papers (2023-06-08T07:33:22Z) - GreekBART: The First Pretrained Greek Sequence-to-Sequence Model [13.429669368275318]
We introduce GreekBART, the first Seq2Seq model based on BART-base architecture and pretrained on a large-scale Greek corpus.
We evaluate and compare GreekBART against BART-random, Greek-BERT, and XLM-R on a variety of discriminative tasks.
arXiv Detail & Related papers (2023-04-03T10:48:51Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Detecting Text Formality: A Study of Text Classification Approaches [78.11745751651708]
This work proposes the first to our knowledge systematic study of formality detection methods based on statistical, neural-based, and Transformer-based machine learning methods.
We conducted three types of experiments -- monolingual, multilingual, and cross-lingual.
The study shows the overcome of Char BiLSTM model over Transformer-based ones for the monolingual and multilingual formality classification task.
arXiv Detail & Related papers (2022-04-19T16:23:07Z) - PENELOPIE: Enabling Open Information Extraction for the Greek Language
through Machine Translation [0.30938904602244344]
We present our submission for the EACL 2021 SRW; a methodology that aims at bridging the gap between high and low-resource languages.
We build Neural Machine Translation (NMT) models for English-to-Greek and Greek-to-English based on the Transformer architecture.
We leverage these NMT models to produce English translations of Greek text as input for our NLP pipeline, to which we apply a series of pre-processing and triple extraction tasks.
arXiv Detail & Related papers (2021-03-28T08:01:58Z) - Lexical semantic change for Ancient Greek and Latin [61.69697586178796]
Associating a word's correct meaning in its historical context is a central challenge in diachronic research.
We build on a recent computational approach to semantic change based on a dynamic Bayesian mixture model.
We provide a systematic comparison of dynamic Bayesian mixture models for semantic change with state-of-the-art embedding-based models.
arXiv Detail & Related papers (2021-01-22T12:04:08Z) - Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer [64.22926988297685]
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP)
In this paper, we explore the landscape of introducing transfer learning techniques for NLP by a unified framework that converts all text-based language problems into a text-to-text format.
arXiv Detail & Related papers (2019-10-23T17:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.