A Semi-Supervised Framework for Misinformation Detection
- URL: http://arxiv.org/abs/2304.11318v1
- Date: Sat, 22 Apr 2023 05:20:58 GMT
- Title: A Semi-Supervised Framework for Misinformation Detection
- Authors: Yueyang Liu, Zois Boukouvalas, and Nathalie Japkowicz
- Abstract summary: Spread of misinformation in social media outlets has become a prevalent societal problem.
We propose a semi-supervised learning framework to deal with extreme class imbalances.
- Score: 6.029433950934382
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The spread of misinformation in social media outlets has become a prevalent
societal problem and is the cause of many kinds of social unrest. Curtailing
its prevalence is of great importance and machine learning has shown
significant promise. However, there are two main challenges when applying
machine learning to this problem. First, while much too prevalent in one
respect, misinformation, actually, represents only a minor proportion of all
the postings seen on social media. Second, labeling the massive amount of data
necessary to train a useful classifier becomes impractical. Considering these
challenges, we propose a simple semi-supervised learning framework in order to
deal with extreme class imbalances that has the advantage, over other
approaches, of using actual rather than simulated data to inflate the minority
class. We tested our framework on two sets of Covid-related Twitter data and
obtained significant improvement in F1-measure on extremely imbalanced
scenarios, as compared to simple classical and deep-learning data generation
methods such as SMOTE, ADASYN, or GAN-based data generation.
Related papers
- GENIU: A Restricted Data Access Unlearning for Imbalanced Data [7.854651232997996]
Class unlearning involves enabling a trained model to forget data belonging to a specific class learned before.
GENIU is the first practical framework for class unlearning imbalanced data settings and restricted data access.
arXiv Detail & Related papers (2024-06-12T05:24:53Z) - CrisisMatch: Semi-Supervised Few-Shot Learning for Fine-Grained Disaster
Tweet Classification [51.58605842457186]
We present a fine-grained disaster tweet classification model under the semi-supervised, few-shot learning setting.
Our model, CrisisMatch, effectively classifies tweets into fine-grained classes of interest using few labeled data and large amounts of unlabeled data.
arXiv Detail & Related papers (2023-10-23T07:01:09Z) - Non-Invasive Fairness in Learning through the Lens of Data Drift [88.37640805363317]
We show how to improve the fairness of Machine Learning models without altering the data or the learning algorithm.
We use a simple but key insight: the divergence of trends between different populations, and, consecutively, between a learned model and minority populations, is analogous to data drift.
We explore two strategies (model-splitting and reweighing) to resolve this drift, aiming to improve the overall conformance of models to the underlying data.
arXiv Detail & Related papers (2023-03-30T17:30:42Z) - A Survey on Class Imbalance in Federated Learning [6.632451878730774]
Federated learning allows multiple client devices in a network to jointly train a machine learning model without direct exposure of clients' data.
It has been found that models trained with federated learning usually have worse performance than their counterparts trained in the standard centralized learning mode.
arXiv Detail & Related papers (2023-03-21T08:34:23Z) - Rethinking Data Heterogeneity in Federated Learning: Introducing a New
Notion and Standard Benchmarks [65.34113135080105]
We show that not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants.
Our observations are intuitive.
Our code is available at https://github.com/MMorafah/FL-SC-NIID.
arXiv Detail & Related papers (2022-09-30T17:15:19Z) - A Survey of Learning on Small Data: Generalization, Optimization, and
Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI.
This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data.
Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z) - GAN based Data Augmentation to Resolve Class Imbalance [0.0]
In many related tasks, the datasets have a very small number of observed fraud cases.
This imbalance presence may impact any learning model's behavior by predicting all labels as the majority class.
We trained Generative Adversarial Network(GAN) to generate a large number of convincing (and reliable) synthetic examples of the minority class.
arXiv Detail & Related papers (2022-06-12T21:21:55Z) - CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep
Learning [55.733193075728096]
Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance.
Sample re-weighting methods are popularly used to alleviate this data bias issue.
We propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data.
arXiv Detail & Related papers (2022-02-11T13:49:51Z) - Active Learning for Skewed Data Sets [25.866341631677688]
We focus on problems with two distinguishing characteristics: severe class imbalance (skew) and small amounts of initial training data.
We propose a hybrid active learning algorithm (HAL) that balances exploiting the knowledge available through the currently labeled training examples with exploring the large amount of unlabeled data.
arXiv Detail & Related papers (2020-05-23T01:50:19Z) - Contrastive Examples for Addressing the Tyranny of the Majority [83.93825214500131]
We propose to create a balanced training dataset, consisting of the original dataset plus new data points in which the group memberships are intervened.
We show that current generative adversarial networks are a powerful tool for learning these data points, called contrastive examples.
arXiv Detail & Related papers (2020-04-14T14:06:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.