GR-NLP-TOOLKIT: An Open-Source NLP Toolkit for Modern Greek
- URL: http://arxiv.org/abs/2412.08520v1
- Date: Wed, 11 Dec 2024 16:34:23 GMT
- Title: GR-NLP-TOOLKIT: An Open-Source NLP Toolkit for Modern Greek
- Authors: Lefteris Loukas, Nikolaos Smyrnioudis, Chrysa Dikonomaki, Spyros Barbakos, Anastasios Toumazatos, John Koutsikakis, Manolis Kyriakakis, Mary Georgiou, Stavros Vassos, John Pavlopoulos, Ion Androutsopoulos,
- Abstract summary: We present GR-NLP-TOOL KIT, an open-source natural language processing (NLP) toolkit developed specifically for modern Greek.
The toolkit provides state-of-the-art performance in five core NLP tasks, namely part-of-speech tagging, morphological tagging, dependency parsing, named entity recognition, and Greeklishto-Greek transliteration.
- Score: 10.595573163276102
- License:
- Abstract: We present GR-NLP-TOOLKIT, an open-source natural language processing (NLP) toolkit developed specifically for modern Greek. The toolkit provides state-of-the-art performance in five core NLP tasks, namely part-of-speech tagging, morphological tagging, dependency parsing, named entity recognition, and Greeklishto-Greek transliteration. The toolkit is based on pre-trained Transformers, it is freely available, and can be easily installed in Python (pip install gr-nlp-toolkit). It is also accessible through a demonstration platform on HuggingFace, along with a publicly available API for non-commercial use. We discuss the functionality provided for each task, the underlying methods, experiments against comparable open-source toolkits, and future possible enhancements. The toolkit is available at: https://github.com/nlpaueb/gr-nlp-toolkit
Related papers
- PyPulse: A Python Library for Biosignal Imputation [58.35269251730328]
We introduce PyPulse, a Python package for imputation of biosignals in both clinical and wearable sensor settings.
PyPulse's framework provides a modular and extendable framework with high ease-of-use for a broad userbase, including non-machine-learning bioresearchers.
We released PyPulse under the MIT License on Github and PyPI.
arXiv Detail & Related papers (2024-12-09T11:00:55Z) - CMULAB: An Open-Source Framework for Training and Deployment of Natural Language Processing Models [59.91221728187576]
This paper introduces the CMU Linguistic Linguistic Backend, an open-source framework that simplifies model deployment and continuous human-in-the-loop fine-tuning of NLP models.
CMULAB enables users to leverage the power of multilingual models to quickly adapt and extend existing tools for speech recognition, OCR, translation, and syntactic analysis to new languages.
arXiv Detail & Related papers (2024-04-03T02:21:46Z) - VNLP: Turkish NLP Package [0.0]
VNLP is a state-of-the-art Natural Language Processing (NLP) package for the Turkish language.
It contains a wide variety of tools, ranging from the simplest tasks, such as sentence splitting and text normalization, to the more advanced ones, such as text and token classification models.
VNLP has an open-source GitHub repository, ReadtheDocs documentation, PyPi package for convenient installation, Python and command-line API.
arXiv Detail & Related papers (2024-03-02T20:46:56Z) - PyThaiNLP: Thai Natural Language Processing in Python [4.61731352666614]
PyThaiNLP is a free and open-source natural language processing (NLP) library for Thai language implemented in Python.
It provides a wide range of software, models, and datasets for Thai language.
arXiv Detail & Related papers (2023-12-07T19:19:43Z) - HugNLP: A Unified and Comprehensive Library for Natural Language
Processing [14.305751154503133]
We introduce HugNLP, a library for natural language processing (NLP) with the prevalent backend of HuggingFace Transformers.
HugNLP consists of a hierarchical structure including models, processors and applications that unifies the learning process of pre-trained language models (PLMs) on different NLP tasks.
arXiv Detail & Related papers (2023-02-28T03:38:26Z) - Binding Language Models in Symbolic Languages [146.3027328556881]
Binder is a training-free neural-symbolic framework that maps the task input to a program.
In the parsing stage, Codex is able to identify the part of the task input that cannot be answerable by the original programming language.
In the execution stage, Codex can perform versatile functionalities given proper prompts in the API calls.
arXiv Detail & Related papers (2022-10-06T12:55:17Z) - ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet [95.39817519115394]
ESPnet-SLU is a project inside end-to-end speech processing toolkit, ESPnet.
It is designed for quick development of spoken language understanding in a single framework.
arXiv Detail & Related papers (2021-11-29T17:05:49Z) - Trankit: A Light-Weight Transformer-based Toolkit for Multilingual
Natural Language Processing [22.38792093462942]
Trankit is a light-weight Transformer-based Toolkit for multilingual Natural Language Processing (NLP)
It provides a trainable pipeline for fundamental NLP tasks over 100 languages, and 90 pretrained pipelines for 56 languages.
Trankit significantly outperforms prior multilingual NLP pipelines over sentence segmentation, part-of-speech tagging, morphological feature tagging, and dependency parsing.
arXiv Detail & Related papers (2021-01-09T04:55:52Z) - N-LTP: An Open-source Neural Language Technology Platform for Chinese [68.58732970171747]
textttN- is an open-source neural language technology platform supporting six fundamental Chinese NLP tasks.
textttN- adopts the multi-task framework by using a shared pre-trained model, which has the advantage of capturing the shared knowledge across relevant Chinese tasks.
arXiv Detail & Related papers (2020-09-24T11:45:39Z) - ESPnet-ST: All-in-One Speech Translation Toolkit [57.76342114226599]
ESPnet-ST is a new project inside end-to-end speech processing toolkit, ESPnet.
It implements automatic speech recognition, machine translation, and text-to-speech functions for speech translation.
We provide all-in-one recipes including data pre-processing, feature extraction, training, and decoding pipelines.
arXiv Detail & Related papers (2020-04-21T18:38:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.