PyThaiNLP: Thai Natural Language Processing in Python
- URL: http://arxiv.org/abs/2312.04649v1
- Date: Thu, 7 Dec 2023 19:19:43 GMT
- Title: PyThaiNLP: Thai Natural Language Processing in Python
- Authors: Wannaphong Phatthiyaphaibun, Korakot Chaovavanich, Charin Polpanumas,
Arthit Suriyawongkul, Lalita Lowphansirikul, Pattarawat Chormai, Peerat
Limkonchotiwat, Thanathip Suntorntip, Can Udomcharoenchaikit
- Abstract summary: PyThaiNLP is a free and open-source natural language processing (NLP) library for Thai language implemented in Python.
It provides a wide range of software, models, and datasets for Thai language.
- Score: 4.61731352666614
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present PyThaiNLP, a free and open-source natural language processing
(NLP) library for Thai language implemented in Python. It provides a wide range
of software, models, and datasets for Thai language. We first provide a brief
historical context of tools for Thai language prior to the development of
PyThaiNLP. We then outline the functionalities it provided as well as datasets
and pre-trained language models. We later summarize its development milestones
and discuss our experience during its development. We conclude by demonstrating
how industrial and research communities utilize PyThaiNLP in their work. The
library is freely available at https://github.com/pythainlp/pythainlp.
Related papers
- CMULAB: An Open-Source Framework for Training and Deployment of Natural Language Processing Models [59.91221728187576]
This paper introduces the CMU Linguistic Linguistic Backend, an open-source framework that simplifies model deployment and continuous human-in-the-loop fine-tuning of NLP models.
CMULAB enables users to leverage the power of multilingual models to quickly adapt and extend existing tools for speech recognition, OCR, translation, and syntactic analysis to new languages.
arXiv Detail & Related papers (2024-04-03T02:21:46Z) - pyvene: A Library for Understanding and Improving PyTorch Models via
Interventions [79.72930339711478]
$textbfpyvene$ is an open-source library that supports customizable interventions on a range of different PyTorch modules.
We show how $textbfpyvene$ provides a unified framework for performing interventions on neural models and sharing the intervened upon models with others.
arXiv Detail & Related papers (2024-03-12T16:46:54Z) - VNLP: Turkish NLP Package [0.0]
VNLP is a state-of-the-art Natural Language Processing (NLP) package for the Turkish language.
It contains a wide variety of tools, ranging from the simplest tasks, such as sentence splitting and text normalization, to the more advanced ones, such as text and token classification models.
VNLP has an open-source GitHub repository, ReadtheDocs documentation, PyPi package for convenient installation, Python and command-line API.
arXiv Detail & Related papers (2024-03-02T20:46:56Z) - Python is Not Always the Best Choice: Embracing Multilingual Program of Thoughts [51.49688654641581]
We propose a task and model agnostic approach called MultiPoT, which harnesses strength and diversity from various languages.
Experimental results reveal that it significantly outperforms Python Self-Consistency.
In particular, MultiPoT achieves more than 4.6% improvement on average on ChatGPT (gpt-3.5-turbo-0701)
arXiv Detail & Related papers (2024-02-16T13:48:06Z) - Causal-learn: Causal Discovery in Python [53.17423883919072]
Causal discovery aims at revealing causal relations from observational data.
$textitcausal-learn$ is an open-source Python library for causal discovery.
arXiv Detail & Related papers (2023-07-31T05:00:35Z) - Recent Advances in Natural Language Processing via Large Pre-Trained
Language Models: A Survey [67.82942975834924]
Large, pre-trained language models such as BERT have drastically changed the Natural Language Processing (NLP) field.
We present a survey of recent work that uses these large language models to solve NLP tasks via pre-training then fine-tuning, prompting, or text generation approaches.
arXiv Detail & Related papers (2021-11-01T20:08:05Z) - LaoPLM: Pre-trained Language Models for Lao [3.2146309563776416]
Pre-trained language models (PLMs) can capture different levels of concepts in context and hence generate universal language representations.
Although PTMs have been widely used in most NLP applications, it is under-represented in Lao NLP research.
We construct a text classification dataset to alleviate the resource-scare situation of the Lao language.
We present the first transformer-based PTMs for Lao with four versions: BERT-small, BERT-base, ELECTRA-small and ELECTRA-base, and evaluate it over two downstream tasks: part-of-speech tagging and text classification.
arXiv Detail & Related papers (2021-10-12T11:13:07Z) - pysentimiento: A Python Toolkit for Opinion Mining and Social NLP tasks [0.2826977330147589]
pysentimiento is a Python toolkit designed for opinion mining and other Social NLP tasks.
This open-source library brings state-of-the-art models for Spanish, English, Italian, and Portuguese in an easy-to-use Python library.
We present a comprehensive assessment of performance for several pre-trained language models across a variety of tasks, languages, and datasets.
arXiv Detail & Related papers (2021-06-17T13:15:07Z) - Text Normalization for Low-Resource Languages of Africa [1.5766133856827325]
In this study, we examine the effects of text normalization and data set quality for a set of low-resource languages of Africa.
We describe our text normalizer which we built in the Pynini framework, a Python library for finite state transducers, and our experiments in training language models for African languages.
arXiv Detail & Related papers (2021-03-29T18:00:26Z) - N-LTP: An Open-source Neural Language Technology Platform for Chinese [68.58732970171747]
textttN- is an open-source neural language technology platform supporting six fundamental Chinese NLP tasks.
textttN- adopts the multi-task framework by using a shared pre-trained model, which has the advantage of capturing the shared knowledge across relevant Chinese tasks.
arXiv Detail & Related papers (2020-09-24T11:45:39Z) - Stanza: A Python Natural Language Processing Toolkit for Many Human
Languages [44.8226642800919]
We introduce Stanza, an open-source Python natural language processing toolkit supporting 66 human languages.
Stanza features a language-agnostic fully neural pipeline for text analysis, including tokenization, multi-word token expansion, lemmatization, part-of-speech and morphological feature tagging.
We have trained Stanza on a total of 112 datasets, including the Universal Dependencies treebanks and other multilingual corpora.
arXiv Detail & Related papers (2020-03-16T09:05:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.