pytopicgram: A library for data extraction and topic modeling from Telegram channels
- URL: http://arxiv.org/abs/2502.04882v1
- Date: Fri, 07 Feb 2025 12:41:47 GMT
- Title: pytopicgram: A library for data extraction and topic modeling from Telegram channels
- Authors: J. Gómez-Romero, J. Cantón Correa, R. Pérez Mercado, F. Prados Abad, M. Molina-Solana, W. Fajardo,
- Abstract summary: pytopicgram is a Python library that helps researchers collect, organize, and analyze these Telegram messages.
pytopicgram allows users to understand how content spreads and how audiences interact on Telegram.
- Score: 0.0
- License:
- Abstract: Telegram is a popular platform for public communication, generating large amounts of messages through its channels. pytopicgram is a Python library that helps researchers collect, organize, and analyze these Telegram messages. The library offers key features such as easy message retrieval, detailed channel information, engagement metrics, and topic identification using advanced modeling techniques. By simplifying data extraction and analysis, pytopicgram allows users to understand how content spreads and how audiences interact on Telegram. This paper describes the design, main features, and practical uses of \pytopicgram, showcasing its effectiveness for studying public conversations on Telegram.
Related papers
- TelegramScrap: A comprehensive tool for scraping Telegram data [0.0]
TelegramScrap is a tool for extracting and analyzing data from Telegram channels and groups.
This white paper outlines the tool's development, capabilities, and applications in academic and scientific research.
arXiv Detail & Related papers (2024-12-21T21:46:56Z) - Bridging Nodes and Narrative Flows: Identifying Intervention Targets for Disinformation on Telegram [1.124958340749622]
We examine the structural mechanisms that facilitate the propagation of debunked misinformation on Telegram.
We introduce a multi-dimensional 'bridging' metric to quantify the influence of nodal Telegram channels.
We uncover the small subset of nodes, and identify patterns that are emblematic of information 'flows' on this platform.
arXiv Detail & Related papers (2024-11-08T19:10:42Z) - Integrating Large Language Models with Graph-based Reasoning for Conversational Question Answering [58.17090503446995]
We focus on a conversational question answering task which combines the challenges of understanding questions in context and reasoning over evidence gathered from heterogeneous sources like text, knowledge graphs, tables, and infoboxes.
Our method utilizes a graph structured representation to aggregate information about a question and its context.
arXiv Detail & Related papers (2024-06-14T13:28:03Z) - Enhancing Chat Language Models by Scaling High-quality Instructional
Conversations [91.98516412612739]
We first provide a systematically designed, diverse, informative, large-scale dataset of instructional conversations, UltraChat.
Our objective is to capture the breadth of interactions that a human might have with an AI assistant.
We fine-tune a LLaMA model to create a powerful conversational model, UltraLLaMA.
arXiv Detail & Related papers (2023-05-23T16:49:14Z) - TGDataset: Collecting and Exploring the Largest Telegram Channels Dataset [57.2282378772772]
This paper presents the TGDataset, a new dataset that includes 120,979 Telegram channels and over 400 million messages.
We analyze the languages spoken within our dataset and the topic covered by English channels.
In addition to the raw dataset, we released the scripts we used to analyze the dataset and the list of channels belonging to the network of a new conspiracy theory called Sabmyk.
arXiv Detail & Related papers (2023-03-09T15:42:38Z) - Using Large Language Models to Generate Engaging Captions for Data
Visualizations [51.98253121636079]
Large language models (LLM) use sophisticated deep learning technology to produce human-like prose.
Key challenge lies in designing the most effective prompt for the LLM, a task called prompt engineering.
We report on first experiments using the popular LLM GPT-3 and deliver some promising results.
arXiv Detail & Related papers (2022-12-27T23:56:57Z) - Introducing an Abusive Language Classification Framework for Telegram to
Investigate the German Hater Community [0.6459215652021234]
We develop a framework that consists of (i) an abusive language classification model for German Telegram messages and (ii) a classification model for the hatefulness of Telegram channels.
For the channel classification model, we develop a method that combines channel specific content information coming from a topic model with a social graph to predict the hatefulness of channels.
As an additional output of the study, we release an annotated abusive language dataset containing 1,149 annotated Telegram messages.
arXiv Detail & Related papers (2021-09-15T14:58:46Z) - EmailSum: Abstractive Email Thread Summarization [105.46012304024312]
We develop an abstractive Email Thread Summarization (EmailSum) dataset.
This dataset contains human-annotated short (30 words) and long (100 words) summaries of 2549 email threads.
Our results reveal the key challenges of current abstractive summarization models in this task.
arXiv Detail & Related papers (2021-07-30T15:13:14Z) - Unsupervised Summarization for Chat Logs with Topic-Oriented Ranking and
Context-Aware Auto-Encoders [59.038157066874255]
We propose a novel framework called RankAE to perform chat summarization without employing manually labeled data.
RankAE consists of a topic-oriented ranking strategy that selects topic utterances according to centrality and diversity simultaneously.
A denoising auto-encoder is designed to generate succinct but context-informative summaries based on the selected utterances.
arXiv Detail & Related papers (2020-12-14T07:31:17Z) - The Pushshift Telegram Dataset [1.7109522466982476]
We present a dataset from one such mobile messaging platform: Telegram.
Our dataset is made up of over 27.8K channels and 317M messages from 2.2M unique users.
arXiv Detail & Related papers (2020-01-23T10:37:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.