Related papers: Online Digital Investigative Journalism using SociaLens

Online Digital Investigative Journalism using SociaLens

URL: http://arxiv.org/abs/2410.11890v1
Date: Sun, 13 Oct 2024 07:20:47 GMT
Title: Online Digital Investigative Journalism using SociaLens
Authors: Hasan M. Jamil, Sajratul Y. Rubaiat,
Abstract summary: We introduce a versatile and autonomous investigative journalism tool, called em SociaLens, for identifying and extracting query specific data from online sources. We envision its use in investigative journalism, law enforcement and social policy planning. We illustrate the functionality of SociaLens using a focused case study on rape incidents in a developing country.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Media companies witnessed a significant transformation with the rise of the internet, bigdata, machine learning (ML) and AI. Recent emergence of large language models (LLM) have added another aspect to this transformation. Researchers believe that with the help of these technologies, investigative digital journalism will enter a new era. Using a smart set of data gathering and analysis tools, journalists will be able to create data driven contents and insights in unprecedented ways. In this paper, we introduce a versatile and autonomous investigative journalism tool, called {\em SociaLens}, for identifying and extracting query specific data from online sources, responding to probing queries and drawing conclusions entailed by large volumes of data using ML analytics fully autonomously. We envision its use in investigative journalism, law enforcement and social policy planning. The proposed system capitalizes on the integration of ML technology with LLMs and advanced bigdata search techniques. We illustrate the functionality of SociaLens using a focused case study on rape incidents in a developing country and demonstrate that journalists can gain nuanced insights without requiring coding expertise they might lack. SociaLens is designed as a ChatBot that is capable of contextual conversation, find and collect data relevant to queries, initiate ML tasks to respond to queries, generate textual and visual reports, all fully autonomously within the ChatBot environment.

Related papers

NEWSAGENT: Benchmarking Multimodal Agents as Journalists with Real-World Newswriting Tasks [21.577527868033343]
NEWSAGENT is a benchmark for evaluating how agents can automatically search available raw contents, select desired information, and edit and rephrase to form a news article.<n> NEWSAGENT includes 6k human-verified examples derived from real news, with multimodal contents converted to text for broad model compatibility.<n>We believe NEWSAGENT serves a realistic testbed for iterating and evaluating agent capabilities in terms of multimodal web data manipulation to real-world productivity.
arXiv Detail & Related papers (2025-08-30T10:31:34Z)
How can AI agents support journalists' work? An experiment with designing an LLM-driven intelligent reporting system [0.0]
The integration of artificial intelligence into journalistic practices represents a transformative shift in how news is gathered, analyzed, and disseminated.<n>Large language models (LLMs), particularly those with agentic capabilities, offer unprecedented opportunities for enhancing journalistic practices.<n>This research explores how agentic LLMs can support journalists' filtering, based on insights from journalist interviews.
arXiv Detail & Related papers (2025-08-25T14:56:59Z)
Web-Browsing LLMs Can Access Social Media Profiles and Infer User Demographics [7.849709311008473]
Large language models (LLMs) have traditionally relied on static training data, limiting their knowledge to fixed snapshots.<n>Recent advancements have equipped LLMs with web browsing capabilities, enabling real time information retrieval and multi step reasoning over live web content.<n>Here, we evaluate whether web browsing LLMs can infer demographic attributes of social media users given only their usernames.<n>We show that these models can access social media content and predict user demographics with reasonable accuracy.
arXiv Detail & Related papers (2025-07-16T16:21:01Z)
A Python Tool for Reconstructing Full News Text from GDELT [0.0]
This paper presents a novel approach to obtaining full-text newspaper articles at near-zero cost. We focus on the GDELT Web News NGrams 3.0 dataset, which provides high-frequency updates of n-grams extracted from global online news sources. We provide Python code to reconstruct full-text articles from these n-grams by identifying overlapping textual fragments and intelligently merging them.
arXiv Detail & Related papers (2025-04-22T17:40:42Z)
A Complete Survey on LLM-based AI Chatbots [46.18523139094807]
The past few decades have witnessed an upsurge in data, forming the foundation for data-hungry, learning-based AI technology. Conversational agents, often referred to as AI chatbots, rely heavily on such data to train large language models (LLMs) and generate new content (knowledge) in response to user prompts. This paper presents a complete survey of the evolution and deployment of LLM-based chatbots in various sectors.
arXiv Detail & Related papers (2024-06-17T09:39:34Z)
Cross-Data Knowledge Graph Construction for LLM-enabled Educational Question-Answering System: A Case Study at HCMUT [2.8000537365271367]
Large language models (LLMs) have emerged as a vibrant research topic. LLMs face challenges in remembering events, incorporating new information, and addressing domain-specific issues or hallucinations. This article proposes a method for automatically constructing a Knowledge Graph from multiple data sources.
arXiv Detail & Related papers (2024-04-14T16:34:31Z)
The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements. LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information. Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z)
AutoConv: Automatically Generating Information-seeking Conversations with Large Language Models [74.10293412011455]
We propose AutoConv for synthetic conversation generation. Specifically, we formulate the conversation generation problem as a language modeling task. We finetune an LLM with a few human conversations to capture the characteristics of the information-seeking process.
arXiv Detail & Related papers (2023-08-12T08:52:40Z)
Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases. Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding. This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z)
ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information. To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles. Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z)
ChatGPT as your Personal Data Scientist [0.9689893038619583]
This paper introduces a ChatGPT-based conversational data-science framework to act as a "personal data scientist" Our model pivots around four dialogue states: Data visualization, Task Formulation, Prediction Engineering, and Result Summary and Recommendation. In summary, we developed an end-to-end system that not only proves the viability of the novel concept of conversational data science but also underscores the potency of LLMs in solving complex tasks.
arXiv Detail & Related papers (2023-05-23T04:00:16Z)
A Vision for Semantically Enriched Data Science [19.604667287258724]
Key areas such as utilizing domain knowledge and data semantics are areas where we have seen little automation. We envision how leveraging "semantic" understanding and reasoning on data in combination with novel tools for data science automation can help with consistent and explainable data augmentation and transformation.
arXiv Detail & Related papers (2023-03-02T16:03:12Z)
A Survey of Machine Unlearning [56.017968863854186]
Recent regulations now require that, on request, private information about a user must be removed from computer systems. ML models often remember' the old data. Recent works on machine unlearning have not been able to completely solve the problem.
arXiv Detail & Related papers (2022-09-06T08:51:53Z)
Text Mining for Processing Interview Data in Computational Social Science [0.6820436130599382]
We use commercially available text analysis technology to process interview text data from a computational social science study. We find that topical clustering and terminological enrichment provide for convenient exploration and quantification of the responses. We encourage studies in social science to use text analysis, especially for exploratory open-ended studies.
arXiv Detail & Related papers (2020-11-28T00:44:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.