SocialQuotes: Learning Contextual Roles of Social Media Quotes on the Web
- URL: http://arxiv.org/abs/2407.16007v1
- Date: Mon, 22 Jul 2024 19:21:01 GMT
- Title: SocialQuotes: Learning Contextual Roles of Social Media Quotes on the Web
- Authors: John Palowitch, Hamidreza Alvari, Mehran Kazemi, Tanvir Amin, Filip Radlinski,
- Abstract summary: We liken social media embeddings to quotes, formalize the page context as structured natural language signals, and identify a taxonomy of roles for quotes within the page context.
We release SocialQuotes, a new data set built from the Common Crawl of over 32 million social quotes, 8.3k of them with crowdsourced quote annotations.
- Score: 9.130915550141337
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Web authors frequently embed social media to support and enrich their content, creating the potential to derive web-based, cross-platform social media representations that can enable more effective social media retrieval systems and richer scientific analyses. As step toward such capabilities, we introduce a novel language modeling framework that enables automatic annotation of roles that social media entities play in their embedded web context. Using related communication theory, we liken social media embeddings to quotes, formalize the page context as structured natural language signals, and identify a taxonomy of roles for quotes within the page context. We release SocialQuotes, a new data set built from the Common Crawl of over 32 million social quotes, 8.3k of them with crowdsourced quote annotations. Using SocialQuotes and the accompanying annotations, we provide a role classification case study, showing reasonable performance with modern-day LLMs, and exposing explainable aspects of our framework via page content ablations. We also classify a large batch of un-annotated quotes, revealing interesting cross-domain, cross-platform role distributions on the web.
Related papers
- FLASH: Federated Learning-Based LLMs for Advanced Query Processing in Social Networks through RAG [5.5997926295092295]
The system is designed to seamlessly aggregate and curate diverse social media data sources.
The GPT model is trained on decentralized data sources to ensure privacy and security.
arXiv Detail & Related papers (2024-08-06T22:28:13Z) - From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition [59.57095498284501]
We propose a novel approach that recognizes textbfContextual textbfSocial textbfRelationships (textbfConSoR) from a social cognitive perspective.
We construct social-aware descriptive language prompts with social relationships for each image.
Impressively, ConSoR outperforms previous methods with a 12.2% gain on the People-in-Social-Context (PISC) dataset and a 9.8% increase on the People-in-Photo-Album (PIPA) benchmark.
arXiv Detail & Related papers (2024-06-12T16:02:28Z) - SoMeLVLM: A Large Vision Language Model for Social Media Processing [78.47310657638567]
We introduce a Large Vision Language Model for Social Media Processing (SoMeLVLM)
SoMeLVLM is a cognitive framework equipped with five key capabilities including knowledge & comprehension, application, analysis, evaluation, and creation.
Our experiments demonstrate that SoMeLVLM achieves state-of-the-art performance in multiple social media tasks.
arXiv Detail & Related papers (2024-02-20T14:02:45Z) - URL-BERT: Training Webpage Representations via Social Media Engagements [31.6455614291821]
We introduce a new pre-training objective that can be used to adapt LMs to understand URLs and webpages.
Our proposed framework consists of two steps: (1) scalable graph embeddings to learn shallow representations of URLs based on user engagement on social media.
We experimentally demonstrate that our continued pre-training approach improves webpage understanding on a variety of tasks and Twitter internal and external benchmarks.
arXiv Detail & Related papers (2023-10-25T02:22:50Z) - Exploring Embeddings for Measuring Text Relatedness: Unveiling
Sentiments and Relationships in Online Comments [1.7230140898679147]
This paper investigates sentiment and semantic relationships among comments across various social media platforms.
It uses word embeddings to analyze components in sentences and documents.
Our analysis will enable a deeper understanding of the interconnectedness of online comments and will investigate the notion of the internet functioning as a large interconnected brain.
arXiv Detail & Related papers (2023-09-15T04:57:23Z) - TweetNLP: Cutting-Edge Natural Language Processing for Social Media [22.6980150693332]
TweetNLP is an integrated platform for Natural Language Processing (NLP) in social media.
It supports a diverse set of NLP tasks, including generic focus areas such as sentiment analysis and named entity recognition.
System is powered by reasonably-sized Transformer-based language models specialized on social media text.
arXiv Detail & Related papers (2022-06-29T17:16:58Z) - Can You be More Social? Injecting Politeness and Positivity into
Task-Oriented Conversational Agents [60.27066549589362]
Social language used by human agents is associated with greater users' responsiveness and task completion.
The model uses a sequence-to-sequence deep learning architecture, extended with a social language understanding element.
Evaluation in terms of content preservation and social language level using both human judgment and automatic linguistic measures shows that the model can generate responses that enable agents to address users' issues in a more socially appropriate way.
arXiv Detail & Related papers (2020-12-29T08:22:48Z) - SoMin.ai: Personality-Driven Content Generation Platform [60.49416044866648]
We showcase the World's first personality-driven marketing content generation platform, called SoMin.ai.
The platform combines deep multi-view personality profiling framework and style generative adversarial networks.
It can be used for the enhancement of the social networking user experience as well as for content marketing routines.
arXiv Detail & Related papers (2020-11-30T08:33:39Z) - Named Entity Recognition for Social Media Texts with Semantic
Augmentation [70.44281443975554]
Existing approaches for named entity recognition suffer from data sparsity problems when conducted on short and informal texts.
We propose a neural-based approach to NER for social media texts where both local (from running text) and augmented semantics are taken into account.
arXiv Detail & Related papers (2020-10-29T10:06:46Z) - AMUSED: An Annotation Framework of Multi-modal Social Media Data [0.0]
The framework is designed to mitigate the issues of collecting and annotating social media data.
AMUSED can be applied in multiple application domains, as a use case, we have implemented the framework for collecting COVID-19 misinformation data.
arXiv Detail & Related papers (2020-10-01T15:50:41Z) - I Know Where You Are Coming From: On the Impact of Social Media Sources
on AI Model Performance [79.05613148641018]
We will study the performance of different machine learning models when being learned on multi-modal data from different social networks.
Our initial experimental results reveal that social network choice impacts the performance.
arXiv Detail & Related papers (2020-02-05T11:10:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.