VNLP: Turkish NLP Package
- URL: http://arxiv.org/abs/2403.01309v1
- Date: Sat, 2 Mar 2024 20:46:56 GMT
- Title: VNLP: Turkish NLP Package
- Authors: Meliksah Turker, Mehmet Erdi Ari, Aydin Han
- Abstract summary: VNLP is a state-of-the-art Natural Language Processing (NLP) package for the Turkish language.
It contains a wide variety of tools, ranging from the simplest tasks, such as sentence splitting and text normalization, to the more advanced ones, such as text and token classification models.
VNLP has an open-source GitHub repository, ReadtheDocs documentation, PyPi package for convenient installation, Python and command-line API.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this work, we present VNLP: the first dedicated, complete, open-source,
well-documented, lightweight, production-ready, state-of-the-art Natural
Language Processing (NLP) package for the Turkish language. It contains a wide
variety of tools, ranging from the simplest tasks, such as sentence splitting
and text normalization, to the more advanced ones, such as text and token
classification models. Its token classification models are based on "Context
Model", a novel architecture that is both an encoder and an auto-regressive
model. NLP tasks solved by VNLP models include but are not limited to Sentiment
Analysis, Named Entity Recognition, Morphological Analysis \& Disambiguation
and Part-of-Speech Tagging. Moreover, it comes with pre-trained word embeddings
and corresponding SentencePiece Unigram tokenizers. VNLP has an open-source
GitHub repository, ReadtheDocs documentation, PyPi package for convenient
installation, Python and command-line API and a demo page to test all the
functionality. Consequently, our main contribution is a complete, compact,
easy-to-install and easy-to-use NLP package for Turkish.
Related papers
- A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus [71.77214818319054]
Natural language inference is a proxy for natural language understanding.
There is no publicly available NLI corpus for the Romanian language.
We introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs.
arXiv Detail & Related papers (2024-05-20T08:41:15Z) - pyvene: A Library for Understanding and Improving PyTorch Models via
Interventions [79.72930339711478]
$textbfpyvene$ is an open-source library that supports customizable interventions on a range of different PyTorch modules.
We show how $textbfpyvene$ provides a unified framework for performing interventions on neural models and sharing the intervened upon models with others.
arXiv Detail & Related papers (2024-03-12T16:46:54Z) - HugNLP: A Unified and Comprehensive Library for Natural Language
Processing [14.305751154503133]
We introduce HugNLP, a library for natural language processing (NLP) with the prevalent backend of HuggingFace Transformers.
HugNLP consists of a hierarchical structure including models, processors and applications that unifies the learning process of pre-trained language models (PLMs) on different NLP tasks.
arXiv Detail & Related papers (2023-02-28T03:38:26Z) - Binding Language Models in Symbolic Languages [146.3027328556881]
Binder is a training-free neural-symbolic framework that maps the task input to a program.
In the parsing stage, Codex is able to identify the part of the task input that cannot be answerable by the original programming language.
In the execution stage, Codex can perform versatile functionalities given proper prompts in the API calls.
arXiv Detail & Related papers (2022-10-06T12:55:17Z) - HuSpaCy: an industrial-strength Hungarian natural language processing
toolkit [0.0]
A language processing pipeline should consist of close to state-of-the-art lemmatization, morphosyntactic analysis, entity recognition and word embeddings.
This paper introduces HuSpaCy, an industryready Hungarian language processing pipeline.
arXiv Detail & Related papers (2022-01-06T07:49:45Z) - A Data-Centric Framework for Composable NLP Workflows [109.51144493023533]
Empirical natural language processing systems in application domains (e.g., healthcare, finance, education) involve interoperation among multiple components.
We establish a unified open-source framework to support fast development of such sophisticated NLP in a composable manner.
arXiv Detail & Related papers (2021-03-02T16:19:44Z) - Meta-Embeddings for Natural Language Inference and Semantic Similarity
tasks [0.0]
Word Representations form the core component for almost all advanced Natural Language Processing (NLP) applications.
In this paper, we propose to use Meta Embedding derived from few State-of-the-Art (SOTA) models to efficiently tackle mainstream NLP tasks.
arXiv Detail & Related papers (2020-12-01T16:58:01Z) - N-LTP: An Open-source Neural Language Technology Platform for Chinese [68.58732970171747]
textttN- is an open-source neural language technology platform supporting six fundamental Chinese NLP tasks.
textttN- adopts the multi-task framework by using a shared pre-trained model, which has the advantage of capturing the shared knowledge across relevant Chinese tasks.
arXiv Detail & Related papers (2020-09-24T11:45:39Z) - Coreferential Reasoning Learning for Language Representation [88.14248323659267]
We present CorefBERT, a novel language representation model that can capture the coreferential relations in context.
The experimental results show that, compared with existing baseline models, CorefBERT can achieve significant improvements consistently on various downstream NLP tasks.
arXiv Detail & Related papers (2020-04-15T03:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.