Related papers: TweetNLP: Cutting-Edge Natural Language Processing for Social Media

TweetNLP: Cutting-Edge Natural Language Processing for Social Media

URL: http://arxiv.org/abs/2206.14774v1
Date: Wed, 29 Jun 2022 17:16:58 GMT
Title: TweetNLP: Cutting-Edge Natural Language Processing for Social Media
Authors: Jose Camacho-Collados and Kiamehr Rezaee and Talayeh Riahi and Asahi Ushio and Daniel Loureiro and Dimosthenis Antypas and Joanne Boisson and Luis Espinosa-Anke and Fangyu Liu and Eugenio Mart\'inez-C\'amara and Gonzalo Medina and Thomas Buhrmann and Leonardo Neves and Francesco Barbieri
Abstract summary: TweetNLP is an integrated platform for Natural Language Processing (NLP) in social media. It supports a diverse set of NLP tasks, including generic focus areas such as sentiment analysis and named entity recognition. System is powered by reasonably-sized Transformer-based language models specialized on social media text.
Score: 22.6980150693332
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper we present TweetNLP, an integrated platform for Natural Language Processing (NLP) in social media. TweetNLP supports a diverse set of NLP tasks, including generic focus areas such as sentiment analysis and named entity recognition, as well as social media-specific tasks such as emoji prediction and offensive language identification. Task-specific systems are powered by reasonably-sized Transformer-based language models specialized on social media text (in particular, Twitter) which can be run without the need for dedicated hardware or cloud services. The main contributions of TweetNLP are: (1) an integrated Python library for a modern toolkit supporting social media analysis using our various task-specific models adapted to the social domain; (2) an interactive online demo for codeless experimentation using our models; and (3) a tutorial covering a wide variety of typical social media applications.

Related papers

NarrationDep: Narratives on Social Media For Automatic Depression Detection [24.11420537250414]
We have developed a novel model called textttNarrationDep, which focuses on detecting narratives associated with depression. textttNarrationDep is a deep learning framework that jointly models individual user tweet representations and clusters of users' tweets.
arXiv Detail & Related papers (2024-07-24T11:24:25Z)
SocialQuotes: Learning Contextual Roles of Social Media Quotes on the Web [9.130915550141337]
We liken social media embeddings to quotes, formalize the page context as structured natural language signals, and identify a taxonomy of roles for quotes within the page context. We release SocialQuotes, a new data set built from the Common Crawl of over 32 million social quotes, 8.3k of them with crowdsourced quote annotations.
arXiv Detail & Related papers (2024-07-22T19:21:01Z)
SoMeLVLM: A Large Vision Language Model for Social Media Processing [78.47310657638567]
We introduce a Large Vision Language Model for Social Media Processing (SoMeLVLM) SoMeLVLM is a cognitive framework equipped with five key capabilities including knowledge & comprehension, application, analysis, evaluation, and creation. Our experiments demonstrate that SoMeLVLM achieves state-of-the-art performance in multiple social media tasks.
arXiv Detail & Related papers (2024-02-20T14:02:45Z)
Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases. Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding. This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z)
TwHIN-BERT: A Socially-Enriched Pre-trained Language Model for Multilingual Tweet Representations at Twitter [31.698196219228024]
We present TwHIN-BERT, a multilingual language model productionized at Twitter. Our model is trained on 7 billion tweets covering over 100 distinct languages. We evaluate our model on various multilingual social recommendation and semantic understanding tasks.
arXiv Detail & Related papers (2022-09-15T19:01:21Z)
BERTuit: Understanding Spanish language in Twitter through a native transformer [70.77033762320572]
We present bfBERTuit, the larger transformer proposed so far for Spanish language, pre-trained on a massive dataset of 230M Spanish tweets. Our motivation is to provide a powerful resource to better understand Spanish Twitter and to be used on applications focused on this social network.
arXiv Detail & Related papers (2022-04-07T14:28:51Z)
Annotating the Tweebank Corpus on Named Entity Recognition and Building NLP Models for Social Media Analysis [12.871968485402084]
Social media data such as Twitter messages ("tweets") pose a particular challenge to NLP systems because of their short, noisy, and colloquial nature. We aim to create Tweebank-NER, an NER corpus based on Tweebank V2 (TB2), and we use these to train state-of-the-art NLP models. We release the dataset and make the models available to use in an "off-the-shelf" manner for future Tweet NLP research.
arXiv Detail & Related papers (2022-01-18T19:34:23Z)
FBERT: A Neural Transformer for Identifying Offensive Content [67.12838911384024]
fBERT is a BERT model retrained on SOLID, the largest English offensive language identification corpus available with over $1.4$ million offensive instances. We evaluate fBERT's performance on identifying offensive content on multiple English datasets and we test several thresholds for selecting instances from SOLID. The fBERT model will be made freely available to the community.
arXiv Detail & Related papers (2021-09-10T19:19:26Z)
pysentimiento: A Python Toolkit for Opinion Mining and Social NLP tasks [0.2826977330147589]
pysentimiento is a Python toolkit designed for opinion mining and other Social NLP tasks. This open-source library brings state-of-the-art models for Spanish, English, Italian, and Portuguese in an easy-to-use Python library. We present a comprehensive assessment of performance for several pre-trained language models across a variety of tasks, languages, and datasets.
arXiv Detail & Related papers (2021-06-17T13:15:07Z)
Sentiment analysis in tweets: an assessment study from classical to modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information. Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks. This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z)
Semi-automatic Generation of Multilingual Datasets for Stance Detection in Twitter [9.359018642178917]
This paper presents a method to obtain multilingual datasets for stance detection in Twitter. We leverage user-based information to semi-automatically label large amounts of tweets.
arXiv Detail & Related papers (2021-01-28T13:05:09Z)
N-LTP: An Open-source Neural Language Technology Platform for Chinese [68.58732970171747]
textttN- is an open-source neural language technology platform supporting six fundamental Chinese NLP tasks. textttN- adopts the multi-task framework by using a shared pre-trained model, which has the advantage of capturing the shared knowledge across relevant Chinese tasks.
arXiv Detail & Related papers (2020-09-24T11:45:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.