Real-Time Summarization of Twitter
- URL: http://arxiv.org/abs/2407.08125v2
- Date: Wed, 23 Oct 2024 22:01:13 GMT
- Title: Real-Time Summarization of Twitter
- Authors: Yixin Jin, Meiqi Wang, Meng Li, Wenjing Zhou, Yi Shen, Hao Liu,
- Abstract summary: We focus on real time push notification scenario, which requires a system monitors the stream of sampled tweets and returns the tweets relevant to given interest profiles.
We employ Dirichlet score with and with very little smoothing (baseline) to classify whether a tweet is relevant to a given interest profile.
It is also desired to remove the redundant tweets from the pushing queue.
- Score: 9.034423337410274
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this paper, we describe our approaches to TREC Real-Time Summarization of Twitter. We focus on real time push notification scenario, which requires a system monitors the stream of sampled tweets and returns the tweets relevant and novel to given interest profiles. Dirichlet score with and with very little smoothing (baseline) are employed to classify whether a tweet is relevant to a given interest profile. Using metrics including Mean Average Precision (MAP, cumulative gain (CG) and discount cumulative gain (DCG), the experiment indicates that our approach has a good performance. It is also desired to remove the redundant tweets from the pushing queue. Due to the precision limit, we only describe the algorithm in this paper.
Related papers
- Less is More: One-shot Subgraph Reasoning on Large-scale Knowledge Graphs [49.547988001231424]
We propose the one-shot-subgraph link prediction to achieve efficient and adaptive prediction.
Design principle is that, instead of directly acting on the whole KG, the prediction procedure is decoupled into two steps.
We achieve promoted efficiency and leading performances on five large-scale benchmarks.
arXiv Detail & Related papers (2024-03-15T12:00:12Z) - Context-Based Tweet Engagement Prediction [0.0]
This thesis investigates how well context alone may be used to predict tweet engagement likelihood.
We employed the Spark engine on TU Wien's Little Big Data Cluster to create scalable data preprocessing, feature engineering, feature selection, and machine learning pipelines.
We also found that factors such as the prediction algorithm, training dataset size, training dataset sampling method, and feature selection significantly affect the results.
arXiv Detail & Related papers (2023-09-28T08:36:57Z) - Towards Detecting Inauthentic Coordination in Twitter Likes Data [0.0]
Users customarily take engagement metrics such as likes as a neutral proxy for quality and authority.
This incentivizes like manipulation to influence public opinion through *coordinated inauthentic behavior* (CIB)
CIB targeted at likes is largely unstudied as collecting suitable data about users' liking behavior is non-trivial.
This paper contributes a scripted algorithm to collect suitable liking data from Twitter and a collected 30 day dataset of liking data from the Danish political Twittersphere #dkpol.
arXiv Detail & Related papers (2023-05-12T11:24:26Z) - LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text
Retrieval [55.097573036580066]
Experimental results show that LaPraDoR achieves state-of-the-art performance compared with supervised dense retrieval models.
Compared to re-ranking, our lexicon-enhanced approach can be run in milliseconds (22.5x faster) while achieving superior performance.
arXiv Detail & Related papers (2022-03-11T18:53:12Z) - Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal
Misinformation [83.2079454464572]
This paper describes our approach to the Image-Text Inconsistency Detection challenge of the DARPA Semantic Forensics (SemaFor) Program.
We collect Twitter-COMMs, a large-scale multimodal dataset with 884k tweets relevant to the topics of Climate Change, COVID-19, and Military Vehicles.
We train our approach, based on the state-of-the-art CLIP model, leveraging automatically generated random and hard negatives.
arXiv Detail & Related papers (2021-12-16T03:37:20Z) - Identification of Twitter Bots based on an Explainable ML Framework: the
US 2020 Elections Case Study [72.61531092316092]
This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data.
Supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm.
Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions.
arXiv Detail & Related papers (2021-12-08T14:12:24Z) - Smart Crawling: A New Approach toward Focus Crawling from Twitter [0.10312968200748115]
Twitter data can be accessed using a REST API.
"SmartTwitter Crawling" (STiC) retrieves a set of tweets related to a target topic.
arXiv Detail & Related papers (2021-10-08T11:04:49Z) - A Case Study to Reveal if an Area of Interest has a Trend in Ongoing
Tweets Using Word and Sentence Embeddings [0.0]
We have proposed an easily applicable automated methodology in which the Daily Mean Similarity Scores show the similarity between the daily tweet corpus and the target words.
The Daily Mean Similarity Scores have mainly based on cosine similarity and word/sentence embeddings.
We have also compared the effectiveness of using word versus sentence embeddings while applying our methodology and realized that both give almost the same results.
arXiv Detail & Related papers (2021-10-02T18:44:55Z) - How Will Your Tweet Be Received? Predicting the Sentiment Polarity of
Tweet Replies [3.5263924621989196]
We propose a new task: predicting the predominant sentiment among (first-order) replies to a given tweet.
We create RETWEET, a large dataset of tweets and replies manually annotated with sentiment labels.
We use the automatically labeled data for supervised training of a neural network to predict reply sentiment from the original tweets.
arXiv Detail & Related papers (2021-04-21T13:08:45Z) - A Closer Look at Temporal Sentence Grounding in Videos: Datasets and
Metrics [70.45937234489044]
We re- organize two widely-used TSGV datasets (Charades-STA and ActivityNet Captions) to make it different from the training split.
We introduce a new evaluation metric "dR@$n$,IoU@$m$" to calibrate the basic IoU scores.
All the results demonstrate that the re-organized datasets and new metric can better monitor the progress in TSGV.
arXiv Detail & Related papers (2021-01-22T09:59:30Z) - Road Network Metric Learning for Estimated Time of Arrival [93.0759529610483]
In this paper, we propose the Road Network Metric Learning framework for Estimated Time of Arrival (ETA)
It consists of two components: (1) a main regression task to predict the travel time, and (2) an auxiliary metric learning task to improve the quality of link embedding vectors.
We show that our method outperforms the state-of-the-art model and the promotion concentrates on the cold links with few data.
arXiv Detail & Related papers (2020-06-24T04:45:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.