TweetNLP: Cutting-Edge Natural Language Processing for Social Media
- URL: http://arxiv.org/abs/2206.14774v1
- Date: Wed, 29 Jun 2022 17:16:58 GMT
- Title: TweetNLP: Cutting-Edge Natural Language Processing for Social Media
- Authors: Jose Camacho-Collados and Kiamehr Rezaee and Talayeh Riahi and Asahi
Ushio and Daniel Loureiro and Dimosthenis Antypas and Joanne Boisson and Luis
Espinosa-Anke and Fangyu Liu and Eugenio Mart\'inez-C\'amara and Gonzalo
Medina and Thomas Buhrmann and Leonardo Neves and Francesco Barbieri
- Abstract summary: TweetNLP is an integrated platform for Natural Language Processing (NLP) in social media.
It supports a diverse set of NLP tasks, including generic focus areas such as sentiment analysis and named entity recognition.
System is powered by reasonably-sized Transformer-based language models specialized on social media text.
- Score: 22.6980150693332
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we present TweetNLP, an integrated platform for Natural
Language Processing (NLP) in social media. TweetNLP supports a diverse set of
NLP tasks, including generic focus areas such as sentiment analysis and named
entity recognition, as well as social media-specific tasks such as emoji
prediction and offensive language identification. Task-specific systems are
powered by reasonably-sized Transformer-based language models specialized on
social media text (in particular, Twitter) which can be run without the need
for dedicated hardware or cloud services. The main contributions of TweetNLP
are: (1) an integrated Python library for a modern toolkit supporting social
media analysis using our various task-specific models adapted to the social
domain; (2) an interactive online demo for codeless experimentation using our
models; and (3) a tutorial covering a wide variety of typical social media
applications.
Related papers
- NarrationDep: Narratives on Social Media For Automatic Depression Detection [24.11420537250414]
We have developed a novel model called textttNarrationDep, which focuses on detecting narratives associated with depression.
textttNarrationDep is a deep learning framework that jointly models individual user tweet representations and clusters of users' tweets.
arXiv Detail & Related papers (2024-07-24T11:24:25Z) - SoMeLVLM: A Large Vision Language Model for Social Media Processing [78.47310657638567]
We introduce a Large Vision Language Model for Social Media Processing (SoMeLVLM)
SoMeLVLM is a cognitive framework equipped with five key capabilities including knowledge & comprehension, application, analysis, evaluation, and creation.
Our experiments demonstrate that SoMeLVLM achieves state-of-the-art performance in multiple social media tasks.
arXiv Detail & Related papers (2024-02-20T14:02:45Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - TwHIN-BERT: A Socially-Enriched Pre-trained Language Model for
Multilingual Tweet Representations at Twitter [31.698196219228024]
We present TwHIN-BERT, a multilingual language model productionized at Twitter.
Our model is trained on 7 billion tweets covering over 100 distinct languages.
We evaluate our model on various multilingual social recommendation and semantic understanding tasks.
arXiv Detail & Related papers (2022-09-15T19:01:21Z) - BERTuit: Understanding Spanish language in Twitter through a native
transformer [70.77033762320572]
We present bfBERTuit, the larger transformer proposed so far for Spanish language, pre-trained on a massive dataset of 230M Spanish tweets.
Our motivation is to provide a powerful resource to better understand Spanish Twitter and to be used on applications focused on this social network.
arXiv Detail & Related papers (2022-04-07T14:28:51Z) - Annotating the Tweebank Corpus on Named Entity Recognition and Building
NLP Models for Social Media Analysis [12.871968485402084]
Social media data such as Twitter messages ("tweets") pose a particular challenge to NLP systems because of their short, noisy, and colloquial nature.
We aim to create Tweebank-NER, an NER corpus based on Tweebank V2 (TB2), and we use these to train state-of-the-art NLP models.
We release the dataset and make the models available to use in an "off-the-shelf" manner for future Tweet NLP research.
arXiv Detail & Related papers (2022-01-18T19:34:23Z) - FBERT: A Neural Transformer for Identifying Offensive Content [67.12838911384024]
fBERT is a BERT model retrained on SOLID, the largest English offensive language identification corpus available with over $1.4$ million offensive instances.
We evaluate fBERT's performance on identifying offensive content on multiple English datasets and we test several thresholds for selecting instances from SOLID.
The fBERT model will be made freely available to the community.
arXiv Detail & Related papers (2021-09-10T19:19:26Z) - pysentimiento: A Python Toolkit for Opinion Mining and Social NLP tasks [0.2826977330147589]
pysentimiento is a Python toolkit designed for opinion mining and other Social NLP tasks.
This open-source library brings state-of-the-art models for Spanish, English, Italian, and Portuguese in an easy-to-use Python library.
We present a comprehensive assessment of performance for several pre-trained language models across a variety of tasks, languages, and datasets.
arXiv Detail & Related papers (2021-06-17T13:15:07Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Semi-automatic Generation of Multilingual Datasets for Stance Detection
in Twitter [9.359018642178917]
This paper presents a method to obtain multilingual datasets for stance detection in Twitter.
We leverage user-based information to semi-automatically label large amounts of tweets.
arXiv Detail & Related papers (2021-01-28T13:05:09Z) - N-LTP: An Open-source Neural Language Technology Platform for Chinese [68.58732970171747]
textttN- is an open-source neural language technology platform supporting six fundamental Chinese NLP tasks.
textttN- adopts the multi-task framework by using a shared pre-trained model, which has the advantage of capturing the shared knowledge across relevant Chinese tasks.
arXiv Detail & Related papers (2020-09-24T11:45:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.