ReINTEL Challenge 2020: A Comparative Study of Hybrid Deep Neural
Network for Reliable Intelligence Identification on Vietnamese SNSs
- URL: http://arxiv.org/abs/2109.12777v1
- Date: Mon, 27 Sep 2021 03:40:28 GMT
- Title: ReINTEL Challenge 2020: A Comparative Study of Hybrid Deep Neural
Network for Reliable Intelligence Identification on Vietnamese SNSs
- Authors: Hoang Viet Trinh, Tung Tien Bui, Tam Minh Nguyen, Huy Quang Dao, Quang
Huu Pham, Ngoc N. Tran, Ta Minh Thanh
- Abstract summary: The overwhelming abundance of data has created a misinformation crisis.
We propose a multi-input model that can effectively leverage both tabular metadata and post content for the task.
Applying state-of-the-art finetuning techniques for the pretrained component and training strategies for our complete model, we have achieved a 0.9462 ROC-score on the VLSP private test set.
- Score: 0.9697877942346906
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The overwhelming abundance of data has created a misinformation crisis.
Unverified sensationalism that is designed to grab the readers' short attention
span, when crafted with malice, has caused irreparable damage to our society's
structure. As a result, determining the reliability of an article has become a
crucial task. After various ablation studies, we propose a multi-input model
that can effectively leverage both tabular metadata and post content for the
task. Applying state-of-the-art finetuning techniques for the pretrained
component and training strategies for our complete model, we have achieved a
0.9462 ROC-score on the VLSP private test set.
Related papers
- The BRAVO Semantic Segmentation Challenge Results in UNCV2024 [68.20197719071436]
We define two categories of reliability: (1) semantic reliability, which reflects the model's accuracy and calibration when exposed to various perturbations; and (2) OOD reliability, which measures the model's ability to detect object classes that are unknown during training.
The results reveal interesting insights into the importance of large-scale pre-training and minimal architectural design in developing robust and reliable semantic segmentation models.
arXiv Detail & Related papers (2024-09-23T15:17:30Z) - Evaluating ML-Based Anomaly Detection Across Datasets of Varied Integrity: A Case Study [0.0]
We introduce two refined versions of the CICIDS-2017 dataset, processed using NFStream to ensure methodologically sound flow expiration and labeling.
Our research contrasts the performance of the Random Forest (RF) algorithm across the original CICIDS-2017, its refined counterparts WTMC-2021 and CRiSIS-2022, and our NFStream-generated datasets.
We observe that the RF model exhibits exceptional robustness, achieving consistent high-performance metrics irrespective of the underlying dataset quality.
arXiv Detail & Related papers (2024-01-30T09:34:15Z) - The Risk of Federated Learning to Skew Fine-Tuning Features and
Underperform Out-of-Distribution Robustness [50.52507648690234]
Federated learning has the risk of skewing fine-tuning features and compromising the robustness of the model.
We introduce three robustness indicators and conduct experiments across diverse robust datasets.
Our approach markedly enhances the robustness across diverse scenarios, encompassing various parameter-efficient fine-tuning methods.
arXiv Detail & Related papers (2024-01-25T09:18:51Z) - Alleviating the Effect of Data Imbalance on Adversarial Training [26.36714114672729]
We study adversarial training on datasets that obey the long-tailed distribution.
We propose a new adversarial training framework -- Re-balancing Adversarial Training (REAT)
arXiv Detail & Related papers (2023-07-14T07:01:48Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of Open Information Extraction [49.15931834209624]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Bayesian Learning with Information Gain Provably Bounds Risk for a
Robust Adversarial Defense [27.545466364906773]
We present a new algorithm to learn a deep neural network model robust against adversarial attacks.
Our model demonstrate significantly improved robustness--up to 20%--compared with adversarial training and Adv-BNN under PGD attacks.
arXiv Detail & Related papers (2022-12-05T03:26:08Z) - Federated Learning with Unreliable Clients: Performance Analysis and
Mechanism Design [76.29738151117583]
Federated Learning (FL) has become a promising tool for training effective machine learning models among distributed clients.
However, low quality models could be uploaded to the aggregator server by unreliable clients, leading to a degradation or even a collapse of training.
We model these unreliable behaviors of clients and propose a defensive mechanism to mitigate such a security risk.
arXiv Detail & Related papers (2021-05-10T08:02:27Z) - NLPBK at VLSP-2020 shared task: Compose transformer pretrained models
for Reliable Intelligence Identification on Social network [0.0]
This paper describes our method for tuning a transformer-based pretrained model, to adaptation with Reliable Intelligence Identification on Vietnamese SNSs problem.
We also proposed a model that combines bert-base pretrained models with some metadata features, such as the number of comments, number of likes, images of SNS documents.
With appropriate training techniques, our model is able to achieve 0.9392 ROC-AUC on public test set and the final version settles at top 2 ROC-AUC (0.9513) on private test set.
arXiv Detail & Related papers (2021-01-29T16:19:28Z) - On the Accuracy of CRNNs for Line-Based OCR: A Multi-Parameter
Evaluation [0.0]
We train a high quality optical character recognition (OCR) model for difficult historical typefaces on degraded paper.
We are able to obtain a 0.44% character error rate (CER) model from only 10,000 lines of training data.
We show ablations for all components of our training pipeline, which relies on the open source framework Calamari.
arXiv Detail & Related papers (2020-08-06T17:20:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.