Predicting the Popularity of Reddit Posts with AI
- URL: http://arxiv.org/abs/2106.07380v2
- Date: Thu, 17 Jun 2021 02:29:06 GMT
- Title: Predicting the Popularity of Reddit Posts with AI
- Authors: Juno Kim
- Abstract summary: This study aims to develop a machine learning model capable of accurately predicting the popularity of a Reddit post.
Specifically, the model is predicting the number of upvotes a post will receive based on its textual content.
I collected Reddit post data from an online data set and analyzed the model's performance when trained on a single subreddit and a collection of subreddits.
- Score: 0.30458514384586405
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Social media creates crucial mass changes, as popular posts and opinions cast
a significant influence on users' decisions and thought processes. For example,
the recent Reddit uprising inspired by r/wallstreetbets which had remarkable
economic impact was started with a series of posts on the thread. The
prediction of posts that may have a notable impact will allow for the
preparation of possible following trends. This study aims to develop a machine
learning model capable of accurately predicting the popularity of a Reddit
post. Specifically, the model is predicting the number of upvotes a post will
receive based on its textual content. I experimented with three different
models: a baseline linear regression model, a random forest regression model,
and a neural network. I collected Reddit post data from an online data set and
analyzed the model's performance when trained on a single subreddit and a
collection of subreddits. The results showed that the neural network model
performed the best when the loss of the models were compared. With the use of a
machine learning model to predict social trends through the reaction users have
to post, a better picture of the near future can be envisioned.
Related papers
- Limits to Predicting Online Speech Using Large Language Models [20.215414802169967]
Recent theoretical results suggest that posts from a user's social circle are as predictive of the user's future posts as that of the user's past posts.
We define predictability as a measure of the model's uncertainty, i.e., its negative log-likelihood on future tokens given context.
Across four large language models ranging in size from 1.5 billion to 70 billion parameters, we find that predicting a user's posts from their peers' posts performs poorly.
arXiv Detail & Related papers (2024-07-08T09:50:49Z) - Humanoid Locomotion as Next Token Prediction [84.21335675130021]
Our model is a causal transformer trained via autoregressive prediction of sensorimotor trajectories.
We show that our model enables a full-sized humanoid to walk in San Francisco zero-shot.
Our model can transfer to the real world even when trained on only 27 hours of walking data, and can generalize commands not seen during training like walking backward.
arXiv Detail & Related papers (2024-02-29T18:57:37Z) - Predictive Churn with the Set of Good Models [64.05949860750235]
We study the effect of conflicting predictions over the set of near-optimal machine learning models.
We present theoretical results on the expected churn between models within the Rashomon set.
We show how our approach can be used to better anticipate, reduce, and avoid churn in consumer-facing applications.
arXiv Detail & Related papers (2024-02-12T16:15:25Z) - Wrapper Boxes: Faithful Attribution of Model Predictions to Training Data [40.7542543934205]
We propose a "wrapper box'' pipeline: training a neural model as usual and then using its learned feature representation in classic, interpretable models to perform prediction.
Across seven language models of varying sizes, we first show that the predictive performance of wrapper classic models is largely comparable to the original neural models.
Our pipeline thus preserves the predictive performance of neural language models while faithfully attributing classic model decisions to training data.
arXiv Detail & Related papers (2023-11-15T01:50:53Z) - Scaling Laws Do Not Scale [54.72120385955072]
Recent work has argued that as the size of a dataset increases, the performance of a model trained on that dataset will increase.
We argue that this scaling law relationship depends on metrics used to measure performance that may not correspond with how different groups of people perceive the quality of models' output.
Different communities may also have values in tension with each other, leading to difficult, potentially irreconcilable choices about metrics used for model evaluations.
arXiv Detail & Related papers (2023-07-05T15:32:21Z) - Variation of Gender Biases in Visual Recognition Models Before and After
Finetuning [29.55318393877906]
We introduce a framework to measure how biases change before and after fine-tuning a large scale visual recognition model for a downstream task.
We find that supervised models trained on datasets such as ImageNet-21k are more likely to retain their pretraining biases.
We also find that models finetuned on larger scale datasets are more likely to introduce new biased associations.
arXiv Detail & Related papers (2023-03-14T03:42:47Z) - Forecasting COVID-19 spreading trough an ensemble of classical and
machine learning models: Spain's case study [0.0]
We evaluate the applicability of an ensemble of population models and machine learning models to predict the near future evolution of the COVID-19 pandemic.
We rely solely in open and public datasets, fusing incidence, vaccination, human mobility and weather data.
arXiv Detail & Related papers (2022-07-12T08:16:44Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Are socially-aware trajectory prediction models really socially-aware? [75.36961426916639]
We introduce a socially-attended attack to assess the social understanding of prediction models.
An attack is a small yet carefully-crafted perturbations to fail predictors.
We show that our attack can be employed to increase the social understanding of state-of-the-art models.
arXiv Detail & Related papers (2021-08-24T17:59:09Z) - FitVid: Overfitting in Pixel-Level Video Prediction [117.59339756506142]
We introduce a new architecture, named FitVid, which is capable of severe overfitting on the common benchmarks.
FitVid outperforms the current state-of-the-art models across four different video prediction benchmarks on four different metrics.
arXiv Detail & Related papers (2021-06-24T17:20:21Z) - What do we expect from Multiple-choice QA Systems? [70.86513724662302]
We consider a top performing model on several Multiple Choice Question Answering (MCQA) datasets.
We evaluate it against a set of expectations one might have from such a model, using a series of zero-information perturbations of the model's inputs.
arXiv Detail & Related papers (2020-11-20T21:27:10Z) - PsychFM: Predicting your next gamble [0.0]
Most of the human behavior itself can be modeled into a choice prediction problem.
Since the behavior is person dependent, there is a need to build a model that predicts choices on a per-person basis.
A novel hybrid model namely psychological factorisation machine (PsychFM) has been proposed that involves concepts from machine learning as well as psychological theories.
arXiv Detail & Related papers (2020-07-03T17:41:14Z) - Learning Opinion Dynamics From Social Traces [25.161493874783584]
We propose an inference mechanism for fitting a generative, agent-like model of opinion dynamics to real-world social traces.
We showcase our proposal by translating a classical agent-based model of opinion dynamics into its generative counterpart.
We apply our model to real-world data from Reddit to explore the long-standing question about the impact of backfire effect.
arXiv Detail & Related papers (2020-06-02T14:48:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.