FinnSentiment -- A Finnish Social Media Corpus for Sentiment Polarity
Annotation
- URL: http://arxiv.org/abs/2012.02613v1
- Date: Fri, 4 Dec 2020 14:17:46 GMT
- Title: FinnSentiment -- A Finnish Social Media Corpus for Sentiment Polarity
Annotation
- Authors: Krister Lind\'en and Tommi Jauhiainen and Sam Hardwick
- Abstract summary: There is no large-scale social media data set with sentiment polarity annotations for Finnish.
We introduce a 27,000 sentence data set annotated independently with sentiment polarity by three native annotators.
We analyse their inter-annotator agreement and provide two baselines to validate the usefulness of the data set.
- Score: 1.5039745292757671
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sentiment analysis and opinion mining is an important task with obvious
application areas in social media, e.g. when indicating hate speech and fake
news. In our survey of previous work, we note that there is no large-scale
social media data set with sentiment polarity annotations for Finnish. This
publications aims to remedy this shortcoming by introducing a 27,000 sentence
data set annotated independently with sentiment polarity by three native
annotators. We had the same three annotators for the whole data set, which
provides a unique opportunity for further studies of annotator behaviour over
time. We analyse their inter-annotator agreement and provide two baselines to
validate the usefulness of the data set.
Related papers
- Sampled Datasets Risk Substantial Bias in the Identification of Political Polarization on Social Media [34.192291430580454]
We study the structural polarization of the Polish political debate on Twitter over a 24-hour period.
Large samples can be representative of the whole political discussion on a platform, but small samples consistently fail to accurately reflect the true structure of polarization online.
arXiv Detail & Related papers (2024-06-28T12:13:29Z) - When a Language Question Is at Stake. A Revisited Approach to Label
Sensitive Content [0.0]
Article revisits an approach of pseudo-labeling sensitive data on the example of Ukrainian tweets covering the Russian-Ukrainian war.
We provide a fundamental statistical analysis of the obtained data, evaluation of models used for pseudo-labelling, and set further guidelines on how the scientists can leverage the corpus.
arXiv Detail & Related papers (2023-11-17T13:35:10Z) - Measuring the Effect of Influential Messages on Varying Personas [67.1149173905004]
We present a new task, Response Forecasting on Personas for News Media, to estimate the response a persona might have upon seeing a news message.
The proposed task not only introduces personalization in the modeling but also predicts the sentiment polarity and intensity of each response.
This enables more accurate and comprehensive inference on the mental state of the persona.
arXiv Detail & Related papers (2023-05-25T21:01:00Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - Depression detection in social media posts using affective and social
norm features [84.12658971655253]
We propose a deep architecture for depression detection from social media posts.
We incorporate profanity and morality features of posts and words in our architecture using a late fusion scheme.
The inclusion of the proposed features yields state-of-the-art results in both settings.
arXiv Detail & Related papers (2023-03-24T21:26:27Z) - Unveiling the Hidden Agenda: Biases in News Reporting and Consumption [59.55900146668931]
We build a six-year dataset on the Italian vaccine debate and adopt a Bayesian latent space model to identify narrative and selection biases.
We found a nonlinear relationship between biases and engagement, with higher engagement for extreme positions.
Analysis of news consumption on Twitter reveals common audiences among news outlets with similar ideological positions.
arXiv Detail & Related papers (2023-01-14T18:58:42Z) - How Many Tweets DoWe Need?: Efficient Mining of Short-Term Polarized
Topics on Twitter: A Case Study From Japan [0.0]
We develop a method to identify polarized topics on Twitter in a short-term period, namely 12 hours.
We also develop a prediction method using machine learning techniques to estimate the polarization level using randomly collected tweets.
Our work is the first to predict the polarization level of the topics with low-resource tweets.
arXiv Detail & Related papers (2022-11-29T15:41:30Z) - Investigating User Radicalization: A Novel Dataset for Identifying
Fine-Grained Temporal Shifts in Opinion [7.028604573959653]
We introduce an innovative annotated dataset for modeling subtle opinion fluctuations and detecting fine-grained stances.
The dataset includes a sufficient amount of stance polarity and intensity labels per user over time and within entire conversational threads.
All posts are annotated by non-experts and a significant portion of the data is also annotated by experts.
arXiv Detail & Related papers (2022-04-16T09:31:25Z) - Exploring Polarization of Users Behavior on Twitter During the 2019
South American Protests [15.065938163384235]
We explore polarization on Twitter in a different context, namely the protest that paralyzed several countries in the South American region in 2019.
By leveraging users' endorsement of politicians' tweets and hashtag campaigns with defined stances towards the protest (for or against), we construct a weakly labeled stance dataset with millions of users.
We find empirical evidence of the "filter bubble" phenomenon during the event, as we not only show that the user bases are homogeneous in terms of stance, but the probability that a user transitions from media of different clusters is low.
arXiv Detail & Related papers (2021-04-05T07:13:18Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - Can x2vec Save Lives? Integrating Graph and Language Embeddings for
Automatic Mental Health Classification [91.3755431537592]
I show how merging graph and language embedding models (metapath2vec and doc2vec) avoids resource limits.
When integrated, both data produce highly accurate predictions (90%, with 10% false-positives and 12% false-negatives)
These results extend research on the importance of simultaneously analyzing behavior and language in massive networks.
arXiv Detail & Related papers (2020-01-04T20:56:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.