Synerise at RecSys 2021: Twitter user engagement prediction with a fast
neural model
- URL: http://arxiv.org/abs/2109.12985v1
- Date: Thu, 23 Sep 2021 13:51:09 GMT
- Title: Synerise at RecSys 2021: Twitter user engagement prediction with a fast
neural model
- Authors: Micha{\l} Daniluk, Jacek D\k{a}browski, Barbara Rychalska, Konrad
Go{\l}uchowski
- Abstract summary: We present our 2nd place solution to ACM RecSys 2021 Challenge organized by Twitter.
The challenge aims to predict user engagement for a set of tweets, offering an exceptionally large data set of 1 billion data points.
Average inference time for single tweet engagement prediction is limited to 6ms on a single CPU core with 64GB memory.
- Score: 0.745554610293091
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we present our 2nd place solution to ACM RecSys 2021 Challenge
organized by Twitter. The challenge aims to predict user engagement for a set
of tweets, offering an exceptionally large data set of 1 billion data points
sampled from over four weeks of real Twitter interactions. Each data point
contains multiple sources of information, such as tweet text along with
engagement features, user features, and tweet features. The challenge brings
the problem close to a real production environment by introducing strict
latency constraints in the model evaluation phase: the average inference time
for single tweet engagement prediction is limited to 6ms on a single CPU core
with 64GB memory. Our proposed model relies on extensive feature engineering
performed with methods such as the Efficient Manifold Density Estimator (EMDE)
- our previously introduced algorithm based on Locality Sensitive Hashing
method, and novel Fourier Feature Encoding, among others. In total, we create
numerous features describing a user's Twitter account status and the content of
a tweet. In order to adhere to the strict latency constraints, the underlying
model is a simple residual feed-forward neural network. The system is a
variation of our previous methods which proved successful in KDD Cup 2021, WSDM
Challenge 2021, and SIGIR eCom Challenge 2020. We release the source code at:
https://github.com/Synerise/recsys-challenge-2021
Related papers
- Truncated Consistency Models [57.50243901368328]
Training consistency models requires learning to map all intermediate points along PF ODE trajectories to their corresponding endpoints.
We empirically find that this training paradigm limits the one-step generation performance of consistency models.
We propose a new parameterization of the consistency function and a two-stage training procedure that prevents the truncated-time training from collapsing to a trivial solution.
arXiv Detail & Related papers (2024-10-18T22:38:08Z) - Lightweight Boosting Models for User Response Prediction Using
Adversarial Validation [2.4040470282119983]
The ACM RecSys Challenge 2023, organized by ShareChat, aims to predict the probability of the app being installed.
This paper describes the lightweight solution to this challenge.
arXiv Detail & Related papers (2023-10-05T13:57:05Z) - Context-Based Tweet Engagement Prediction [0.0]
This thesis investigates how well context alone may be used to predict tweet engagement likelihood.
We employed the Spark engine on TU Wien's Little Big Data Cluster to create scalable data preprocessing, feature engineering, feature selection, and machine learning pipelines.
We also found that factors such as the prediction algorithm, training dataset size, training dataset sampling method, and feature selection significantly affect the results.
arXiv Detail & Related papers (2023-09-28T08:36:57Z) - BotArtist: Generic approach for bot detection in Twitter via semi-automatic machine learning pipeline [47.61306219245444]
Twitter has become a target for bots and fake accounts, resulting in the spread of false information and manipulation.
This paper introduces a semi-automatic machine learning pipeline (SAMLP) designed to address the challenges correlated with machine learning model development.
We develop a comprehensive bot detection model named BotArtist, based on user profile features.
arXiv Detail & Related papers (2023-05-31T09:12:35Z) - Predicting the Geolocation of Tweets Using transformer models on Customized Data [17.55660062746406]
This research is aimed to solve the tweet/user geolocation prediction task.
The suggested approach implements neural networks for natural language processing to estimate the location.
The scope of proposed models has been finetuned on a Twitter dataset.
arXiv Detail & Related papers (2023-03-14T12:56:47Z) - Decoder Tuning: Efficient Language Understanding as Decoding [84.68266271483022]
We present Decoder Tuning (DecT), which in contrast optimize task-specific decoder networks on the output side.
By gradient-based optimization, DecT can be trained within several seconds and requires only one P query per sample.
We conduct extensive natural language understanding experiments and show that DecT significantly outperforms state-of-the-art algorithms with a $200times$ speed-up.
arXiv Detail & Related papers (2022-12-16T11:15:39Z) - Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal
Misinformation [83.2079454464572]
This paper describes our approach to the Image-Text Inconsistency Detection challenge of the DARPA Semantic Forensics (SemaFor) Program.
We collect Twitter-COMMs, a large-scale multimodal dataset with 884k tweets relevant to the topics of Climate Change, COVID-19, and Military Vehicles.
We train our approach, based on the state-of-the-art CLIP model, leveraging automatically generated random and hard negatives.
arXiv Detail & Related papers (2021-12-16T03:37:20Z) - Covid-Transformer: Detecting COVID-19 Trending Topics on Twitter Using
Universal Sentence Encoder [7.305019142196582]
corona-virus disease (also known as COVID-19) has led to a pandemic, impacting more than 200 countries across the globe.
With its global impact, COVID-19 has become a major concern of people almost everywhere.
We try to analyze the tweets and detect the trending topics and major concerns of people on Twitter.
arXiv Detail & Related papers (2020-09-08T19:00:38Z) - 2nd Place Scheme on Action Recognition Track of ECCV 2020 VIPriors
Challenges: An Efficient Optical Flow Stream Guided Framework [57.847010327319964]
We propose a data-efficient framework that can train the model from scratch on small datasets.
Specifically, by introducing a 3D central difference convolution operation, we proposed a novel C3D neural network-based two-stream framework.
It is proved that our method can achieve a promising result even without a pre-trained model on large scale datasets.
arXiv Detail & Related papers (2020-08-10T09:50:28Z) - Superiority of Simplicity: A Lightweight Model for Network Device
Workload Prediction [58.98112070128482]
We propose a lightweight solution for series prediction based on historic observations.
It consists of a heterogeneous ensemble method composed of two models - a neural network and a mean predictor.
It achieves an overall $R2$ score of 0.10 on the available FedCSIS 2020 challenge dataset.
arXiv Detail & Related papers (2020-07-07T15:44:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.