Explicit and Implicit Data Augmentation for Social Event Detection
- URL: http://arxiv.org/abs/2509.04202v1
- Date: Thu, 04 Sep 2025 13:26:24 GMT
- Title: Explicit and Implicit Data Augmentation for Social Event Detection
- Authors: Congbo Ma, Yuxia Wang, Jia Wu, Jian Yang, Jing Du, Zitai Qiu, Qing Li, Hu Wang, Preslav Nakov,
- Abstract summary: Social event detection involves identifying and categorizing important events from social media.<n>We propose Augmentation framework for Social Event Detection (SED-Aug)<n>SED-Aug combines explicit text-based and implicit feature-space augmentation to enhance data diversity and model robustness.
- Score: 61.929049997741735
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Social event detection involves identifying and categorizing important events from social media, which relies on labeled data, but annotation is costly and labor-intensive. To address this problem, we propose Augmentation framework for Social Event Detection (SED-Aug), a plug-and-play dual augmentation framework, which combines explicit text-based and implicit feature-space augmentation to enhance data diversity and model robustness. The explicit augmentation utilizes large language models to enhance textual information through five diverse generation strategies. For implicit augmentation, we design five novel perturbation techniques that operate in the feature space on structural fused embeddings. These perturbations are crafted to keep the semantic and relational properties of the embeddings and make them more diverse. Specifically, SED-Aug outperforms the best baseline model by approximately 17.67% on the Twitter2012 dataset and by about 15.57% on the Twitter2018 dataset in terms of the average F1 score. The code is available at GitHub: https://github.com/congboma/SED-Aug.
Related papers
- PromptAug: Fine-grained Conflict Classification Using Data Augmentation [5.053303126748248]
Augmenting conflict-related data poses unique challenges due to Large Language Model guardrails.<n>This paper introduces PromptAug, an innovative LLM-based data augmentation method.<n> PromptAug statistically significant improvements of 2% in both accuracy and F1-score on conflict and emotion datasets.
arXiv Detail & Related papers (2025-06-24T15:33:18Z) - Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation.<n>We introduce novel methodologies and datasets to overcome these challenges.<n>We propose MhBART, an encoder-decoder model designed to emulate human writing style.<n>We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z) - Towards Effective, Efficient and Unsupervised Social Event Detection in the Hyperbolic Space [54.936897625837474]
This work introduces an unsupervised framework, HyperSED (Hyperbolic SED).<n>Specifically, the framework first models social messages into semantic-based message anchors, and then leverages the structure of the anchor graph.<n>Experiments on public datasets demonstrate HyperSED's competitive performance, along with a substantial improvement in efficiency.
arXiv Detail & Related papers (2024-12-14T06:55:27Z) - Exploring ChatGPT-based Augmentation Strategies for Contrastive Aspect-based Sentiment Analysis [10.69498984286374]
Aspect-based sentiment analysis (ABSA) involves identifying sentiment towards specific aspect terms in a sentence.
We explore the potential of data augmentation using ChatGPT to enhance the sentiment classification performance towards aspect terms.
arXiv Detail & Related papers (2024-09-17T14:12:08Z) - Genetic Learning for Designing Sim-to-Real Data Augmentations [1.03590082373586]
Data augmentations are useful in closing the sim-to-real domain gap when training on synthetic data.
Many image augmentation techniques exist, parametrized by different settings, such as strength and probability.
This paper presents two different interpretable metrics that can be combined to predict how well a certain augmentation policy will work for a specific sim-to-real setting.
arXiv Detail & Related papers (2024-03-11T15:00:56Z) - Hierarchical Knowledge Distillation on Text Graph for Data-limited
Attribute Inference [5.618638372635474]
We develop a text-graph-based few-shot learning model for attribute inferences on social media text data.
Our model first constructs and refines a text graph using manifold learning and message passing.
To further use cross-domain texts and unlabeled texts to improve few-shot performance, a hierarchical knowledge distillation is devised over text graph.
arXiv Detail & Related papers (2024-01-10T05:50:34Z) - InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot
Interactions [23.296139146133573]
We present a large-scale dataset, invig, for interactive visual grounding under language ambiguity.
Our dataset comprises over 520K images accompanied by open-ended goal-oriented disambiguation dialogues.
To the best of our knowledge, the invig dataset is the first large-scale dataset for resolving open-ended interactive visual grounding.
arXiv Detail & Related papers (2023-10-18T17:57:05Z) - AugGPT: Leveraging ChatGPT for Text Data Augmentation [59.76140039943385]
We propose a text data augmentation approach based on ChatGPT (named AugGPT)
AugGPT rephrases each sentence in the training samples into multiple conceptually similar but semantically different samples.
Experiment results on few-shot learning text classification tasks show the superior performance of the proposed AugGPT approach.
arXiv Detail & Related papers (2023-02-25T06:58:16Z) - Virtual Data Augmentation: A Robust and General Framework for
Fine-tuning Pre-trained Models [51.46732511844122]
Powerful pre-trained language models (PLM) can be fooled by small perturbations or intentional attacks.
We present Virtual Data Augmentation (VDA), a general framework for robustly fine-tuning PLMs.
Our approach is able to improve the robustness of PLMs and alleviate the performance degradation under adversarial attacks.
arXiv Detail & Related papers (2021-09-13T09:15:28Z) - CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for
Natural Language Understanding [67.61357003974153]
We propose a novel data augmentation framework dubbed CoDA.
CoDA synthesizes diverse and informative augmented examples by integrating multiple transformations organically.
A contrastive regularization objective is introduced to capture the global relationship among all the data samples.
arXiv Detail & Related papers (2020-10-16T23:57:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.