Investigating User Radicalization: A Novel Dataset for Identifying
Fine-Grained Temporal Shifts in Opinion
- URL: http://arxiv.org/abs/2204.10190v1
- Date: Sat, 16 Apr 2022 09:31:25 GMT
- Title: Investigating User Radicalization: A Novel Dataset for Identifying
Fine-Grained Temporal Shifts in Opinion
- Authors: Flora Sakketou, Allison Lahnala, Liane Vogel, Lucie Flek
- Abstract summary: We introduce an innovative annotated dataset for modeling subtle opinion fluctuations and detecting fine-grained stances.
The dataset includes a sufficient amount of stance polarity and intensity labels per user over time and within entire conversational threads.
All posts are annotated by non-experts and a significant portion of the data is also annotated by experts.
- Score: 7.028604573959653
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: There is an increasing need for the ability to model fine-grained opinion
shifts of social media users, as concerns about the potential polarizing social
effects increase. However, the lack of publicly available datasets that are
suitable for the task presents a major challenge. In this paper, we introduce
an innovative annotated dataset for modeling subtle opinion fluctuations and
detecting fine-grained stances. The dataset includes a sufficient amount of
stance polarity and intensity labels per user over time and within entire
conversational threads, thus making subtle opinion fluctuations detectable both
in long term and in short term. All posts are annotated by non-experts and a
significant portion of the data is also annotated by experts. We provide a
strategy for recruiting suitable non-experts. Our analysis of the
inter-annotator agreements shows that the resulting annotations obtained from
the majority vote of the non-experts are of comparable quality to the
annotations of the experts. We provide analyses of the stance evolution in
short term and long term levels, a comparison of language usage between users
with vacillating and resolute attitudes, and fine-grained stance detection
baselines.
Related papers
- Thinking Racial Bias in Fair Forgery Detection: Models, Datasets and Evaluations [63.52709761339949]
We first contribute a dedicated dataset called the Fair Forgery Detection (FairFD) dataset, where we prove the racial bias of public state-of-the-art (SOTA) methods.
We design novel metrics including Approach Averaged Metric and Utility Regularized Metric, which can avoid deceptive results.
We also present an effective and robust post-processing technique, Bias Pruning with Fair Activations (BPFA), which improves fairness without requiring retraining or weight updates.
arXiv Detail & Related papers (2024-07-19T14:53:18Z) - Geospatial Disparities: A Case Study on Real Estate Prices in Paris [0.3495246564946556]
We propose a toolkit for identifying and mitigating biases arising from geospatial data.
We incorporate an ordinal regression case with spatial attributes, deviating from the binary classification focus.
Illustrating our methodology, we showcase practical applications and scrutinize the implications of choosing geographical aggregation levels for fairness and calibration measures.
arXiv Detail & Related papers (2024-01-29T14:53:14Z) - When a Language Question Is at Stake. A Revisited Approach to Label
Sensitive Content [0.0]
Article revisits an approach of pseudo-labeling sensitive data on the example of Ukrainian tweets covering the Russian-Ukrainian war.
We provide a fundamental statistical analysis of the obtained data, evaluation of models used for pseudo-labelling, and set further guidelines on how the scientists can leverage the corpus.
arXiv Detail & Related papers (2023-11-17T13:35:10Z) - Annotation Error Detection: Analyzing the Past and Present for a More
Coherent Future [63.99570204416711]
We reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets.
We define a uniform evaluation setup including a new formalization of the annotation error detection task.
We release our datasets and implementations in an easy-to-use and open source software package.
arXiv Detail & Related papers (2022-06-05T22:31:45Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - Are You Smarter Than a Random Expert? The Robust Aggregation of
Substitutable Signals [14.03122229316614]
This paper initiates the study of forecast aggregation in a context where experts' knowledge is chosen adversarially from a broad class of information structures.
Under the projective substitutes condition, taking the average of the experts' forecasts improves substantially upon the strategy of trusting a random expert.
We show that by averaging the experts' forecasts and then emphextremizing the average by moving it away from the prior by a constant factor, the aggregator's performance guarantee is substantially better than is possible without knowledge of the prior.
arXiv Detail & Related papers (2021-11-04T20:50:30Z) - Perceptual Score: What Data Modalities Does Your Model Perceive? [73.75255606437808]
We introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features.
We find that recent, more accurate multi-modal models for visual question-answering tend to perceive the visual data less than their predecessors.
Using the perceptual score also helps to analyze model biases by decomposing the score into data subset contributions.
arXiv Detail & Related papers (2021-10-27T12:19:56Z) - On Releasing Annotator-Level Labels and Information in Datasets [6.546195629698355]
We show that label aggregation may introduce representational biases of individual and group perspectives.
We propose recommendations for increased utility and transparency of datasets for downstream use cases.
arXiv Detail & Related papers (2021-10-12T02:35:45Z) - Agreeing to Disagree: Annotating Offensive Language Datasets with
Annotators' Disagreement [7.288480094345606]
We focus on the level of agreement among annotators while selecting data to create offensive language datasets.
Our study comprises the creation of three novel datasets of English tweets covering different topics.
We show that such hard cases, where low agreement is present, are not necessarily due to poor-quality annotation.
arXiv Detail & Related papers (2021-09-28T08:55:04Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - Competency Problems: On Finding and Removing Artifacts in Language Data [50.09608320112584]
We argue that for complex language understanding tasks, all simple feature correlations are spurious.
We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account.
arXiv Detail & Related papers (2021-04-17T21:34:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.