Related papers: pysentimiento: A Python Toolkit for Opinion Mining and Social NLP tasks

pysentimiento: A Python Toolkit for Opinion Mining and Social NLP tasks

URL: http://arxiv.org/abs/2106.09462v3
Date: Sat, 13 Jul 2024 16:21:45 GMT
Title: pysentimiento: A Python Toolkit for Opinion Mining and Social NLP tasks
Authors: Juan Manuel Pérez, Mariela Rajngewerc, Juan Carlos Giudici, Damián A. Furman, Franco Luque, Laura Alonso Alemany, María Vanina Martínez,
Abstract summary: pysentimiento is a Python toolkit designed for opinion mining and other Social NLP tasks. This open-source library brings state-of-the-art models for Spanish, English, Italian, and Portuguese in an easy-to-use Python library. We present a comprehensive assessment of performance for several pre-trained language models across a variety of tasks, languages, and datasets.
Score: 0.2826977330147589
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In recent years, the extraction of opinions and information from user-generated text has attracted a lot of interest, largely due to the unprecedented volume of content in Social Media. However, social researchers face some issues in adopting cutting-edge tools for these tasks, as they are usually behind commercial APIs, unavailable for other languages than English, or very complex to use for non-experts. To address these issues, we present pysentimiento, a comprehensive multilingual Python toolkit designed for opinion mining and other Social NLP tasks. This open-source library brings state-of-the-art models for Spanish, English, Italian, and Portuguese in an easy-to-use Python library, allowing researchers to leverage these techniques. We present a comprehensive assessment of performance for several pre-trained language models across a variety of tasks, languages, and datasets, including an evaluation of fairness in the results.

Related papers

PyGress: Tool for Analyzing the Progression of Code Proficiency in Python OSS Projects [2.3253691531523533]
PyGress is a web-based tool designed to automatically evaluate and visualize Python code proficiency.<n>By submitting a GitHub repository link, the system extracts commit histories, analyzes source code proficiency across CEFR-aligned levels (A1 to C2)<n>The PyGress tool visualizes per-contributor proficiency distribution and tracks project code proficiency progression over time.
arXiv Detail & Related papers (2025-11-08T03:11:24Z)
Molly: Making Large Language Model Agents Solve Python Problem More Logically [11.317420065020173]
Molly agent parses the learners' questioning intent through a scenario-based interaction. At generation stage, the agent reflect on the generated responses to ensure that they not only align with factual content but also effectively answer the user's queries.
arXiv Detail & Related papers (2024-12-24T02:08:38Z)
SocialED: A Python Library for Social Event Detection [53.928241775629566]
SocialED is a comprehensive, open-source Python library designed to support social event detection (SED) tasks. It provides a unified API with detailed documentation, offering researchers and practitioners a complete solution for event detection in social media. SocialED supports a wide range of preprocessing techniques, such as graph construction and tokenization, and includes standardized interfaces for training models and making predictions.
arXiv Detail & Related papers (2024-12-18T03:37:47Z)
Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets. This survey delves into an important attribute of these datasets: the dialect of a language. Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z)
PyThaiNLP: Thai Natural Language Processing in Python [4.61731352666614]
PyThaiNLP is a free and open-source natural language processing (NLP) library for Thai language implemented in Python. It provides a wide range of software, models, and datasets for Thai language.
arXiv Detail & Related papers (2023-12-07T19:19:43Z)
PyPOTS: A Python Toolbox for Data Mining on Partially-Observed Time Series [0.0]
PyPOTS is an open-source Python library dedicated to data mining and analysis on partially-observed time series. It provides easy access to diverse algorithms categorized into four tasks: imputation, classification, clustering, and forecasting.
arXiv Detail & Related papers (2023-05-30T07:57:05Z)
XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages [105.54207724678767]
Data scarcity is a crucial issue for the development of highly multilingual NLP systems. We propose XTREME-UP, a benchmark defined by its focus on the scarce-data scenario rather than zero-shot. XTREME-UP evaluates the capabilities of language models across 88 under-represented languages over 9 key user-centric technologies.
arXiv Detail & Related papers (2023-05-19T18:00:03Z)
Evaluating Embedding APIs for Information Retrieval [51.24236853841468]
We evaluate the capabilities of existing semantic embedding APIs on domain generalization and multilingual retrieval. We find that re-ranking BM25 results using the APIs is a budget-friendly approach and is most effective in English. For non-English retrieval, re-ranking still improves the results, but a hybrid model with BM25 works best, albeit at a higher cost.
arXiv Detail & Related papers (2023-05-10T16:40:52Z)
ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning [70.57126720079971]
Large language models (LLMs) have emerged as the most important breakthroughs in natural language processing (NLP) This paper evaluates ChatGPT on 7 different tasks, covering 37 diverse languages with high, medium, low, and extremely low resources. Compared to the performance of previous models, our extensive experimental results demonstrate a worse performance of ChatGPT for different NLP tasks and languages.
arXiv Detail & Related papers (2023-04-12T05:08:52Z)
TweetNLP: Cutting-Edge Natural Language Processing for Social Media [22.6980150693332]
TweetNLP is an integrated platform for Natural Language Processing (NLP) in social media. It supports a diverse set of NLP tasks, including generic focus areas such as sentiment analysis and named entity recognition. System is powered by reasonably-sized Transformer-based language models specialized on social media text.
arXiv Detail & Related papers (2022-06-29T17:16:58Z)
BERTuit: Understanding Spanish language in Twitter through a native transformer [70.77033762320572]
We present bfBERTuit, the larger transformer proposed so far for Spanish language, pre-trained on a massive dataset of 230M Spanish tweets. Our motivation is to provide a powerful resource to better understand Spanish Twitter and to be used on applications focused on this social network.
arXiv Detail & Related papers (2022-04-07T14:28:51Z)
Python for Smarter Cities: Comparison of Python libraries for static and interactive visualisations of large vector data [0.0]
Python, with its concise and natural syntax, presents a low barrier to entry for municipal staff without computer science backgrounds. This study assesses prominent, actively-developed visualisation libraries in the Python ecosystem with respect to producing visualisations of large vector datasets. All short-listed libraries were able to generate the sample map products for both a small and larger dataset.
arXiv Detail & Related papers (2022-02-26T10:23:29Z)
NLPGym -- A toolkit for evaluating RL agents on Natural Language Processing Tasks [2.5760935151452067]
We release NLPGym, an open-source Python toolkit that provides interactive textual environments for standard NLP tasks. We present experimental results for 6 tasks using different RL algorithms which serve as baselines for further research.
arXiv Detail & Related papers (2020-11-16T20:58:35Z)
ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine Reading Comprehension [53.037401638264235]
We present an evaluation server, ORB, that reports performance on seven diverse reading comprehension datasets. The evaluation server places no restrictions on how models are trained, so it is a suitable test bed for exploring training paradigms and representation learning.
arXiv Detail & Related papers (2019-12-29T07:27:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.