Related papers: Boost AI Power: Data Augmentation Strategies with unlabelled Data and Conformal Prediction, a Case in Alternative Herbal Medicine Discrimination with Electronic Nose

Boost AI Power: Data Augmentation Strategies with unlabelled Data and Conformal Prediction, a Case in Alternative Herbal Medicine Discrimination with Electronic Nose

URL: http://arxiv.org/abs/2102.03088v1
Date: Fri, 5 Feb 2021 10:25:36 GMT
Title: Boost AI Power: Data Augmentation Strategies with unlabelled Data and Conformal Prediction, a Case in Alternative Herbal Medicine Discrimination with Electronic Nose
Authors: Li Liu, Xianghao Zhan, Rumeng Wu, Xiaoqing Guan, Zhan Wang, Wei Zhang, You Wang, Zhiyuan Luo, Guang Li
Abstract summary: Electronic nose proves its effectiveness in alternativeherbal medicine classification, but due to the supervised learn-costing nature, previous research relies on the labelled training data. This study aims to improve classification accuracy via data augmentationstrategies.
Score: 12.31253329379136
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Electronic nose proves its effectiveness in alternativeherbal medicine classification, but due to the supervised learn-ing nature, previous research relies on the labelled training data,which are time-costly and labor-intensive to collect. Consideringthe training data inadequacy in real-world applications, this studyaims to improve classification accuracy via data augmentationstrategies. We stimulated two scenarios to investigate the effective-ness of five data augmentation strategies under different trainingdata inadequacy: in the noise-free scenario, different availability ofunlabelled data were simulated, and in the noisy scenario, differentlevels of Gaussian noises and translational shifts were added tosimulate sensor drifts. The augmentation strategies: noise-addingdata augmentation, semi-supervised learning, classifier-based online learning, inductive conformal prediction (ICP) onlinelearning and the novel ensemble ICP online learning proposed in this study, were compared against supervised learningbaseline, with Linear Discriminant Analysis (LDA) and Support Vector Machine (SVM) as the classifiers. We found thatat least one strategies significantly improved the classification accuracy with LDA(p<=0.05) and showed non-decreasingclassification accuracy with SVM in each tasks. Moreover, our novel strategy: ensemble ICP online learning outperformedthe others by showing non-decreasing classification accuracy on all tasks and significant improvement on most tasks(25/36 tasks,p<=0.05). This study provides a systematic analysis over augmentation strategies, and we provided userswith recommended strategies under specific circumstances. Furthermore, our newly proposed strategy showed botheffectiveness and robustness in boosting the classification model generalizability, which can also be further employed inother machine learning applications.

Related papers

Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining [55.262510814326035]
Existing reweighting strategies primarily focus on group-level data importance. We introduce novel algorithms for dynamic, instance-level data reweighting. Our framework allows us to devise reweighting strategies deprioritizing redundant or uninformative data.
arXiv Detail & Related papers (2025-02-10T17:57:15Z)
Systematic Evaluation of Synthetic Data Augmentation for Multi-class NetFlow Traffic [2.5182419298876857]
Multi-class classification models can identify specific types of attacks, allowing for more targeted and effective incident responses. Recent advances suggest that generative models can assist in data augmentation, claiming to offer superior solutions for imbalanced datasets. Our experiments indicate that resampling methods for balancing training data do not reliably improve classification performance.
arXiv Detail & Related papers (2024-08-28T12:44:07Z)
Unsupervised Transfer Learning via Adversarial Contrastive Training [3.227277661633986]
We propose a novel unsupervised transfer learning approach using adversarial contrastive training (ACT) Our experimental results demonstrate outstanding classification accuracy with both fine-tuned linear probe and K-NN protocol across various datasets.
arXiv Detail & Related papers (2024-08-16T05:11:52Z)
Graph Transductive Defense: a Two-Stage Defense for Graph Membership Inference Attacks [50.19590901147213]
Graph neural networks (GNNs) have become instrumental in diverse real-world applications, offering powerful graph learning capabilities. GNNs are vulnerable to adversarial attacks, including membership inference attacks (MIA) This paper proposes an effective two-stage defense, Graph Transductive Defense (GTD), tailored to graph transductive learning characteristics.
arXiv Detail & Related papers (2024-06-12T06:36:37Z)
Which Augmentation Should I Use? An Empirical Investigation of Augmentations for Self-Supervised Phonocardiogram Representation Learning [5.438725298163702]
Contrastive Self-Supervised Learning (SSL) offers a potential solution to labeled data scarcity. We propose uncovering the optimal augmentations for applying contrastive learning in 1D phonocardiogram (PCG) classification. We demonstrate that depending on its training distribution, the effectiveness of a fully-supervised model can degrade up to 32%, while SSL models only lose up to 10% or even improve in some cases.
arXiv Detail & Related papers (2023-12-01T11:06:00Z)
NTKCPL: Active Learning on Top of Self-Supervised Model by Estimating True Coverage [3.4806267677524896]
We propose a novel active learning strategy, neural tangent kernel clustering-pseudo-labels (NTKCPL) It estimates empirical risk based on pseudo-labels and the model prediction with NTK approximation. We validate our method on five datasets, empirically demonstrating that it outperforms the baseline methods in most cases.
arXiv Detail & Related papers (2023-06-07T01:43:47Z)
Variance-Reduced Gradient Estimation via Noise-Reuse in Online Evolution Strategies [50.10277748405355]
Noise-Reuse Evolution Strategies (NRES) is a general class of unbiased online evolution strategies methods. We show NRES results in faster convergence than existing AD and ES methods in terms of wall-clock time and number of steps across a variety of applications.
arXiv Detail & Related papers (2023-04-21T17:53:05Z)
Mitigating Forgetting in Online Continual Learning via Contrasting Semantically Distinct Augmentations [22.289830907729705]
Online continual learning (OCL) aims to enable model learning from a non-stationary data stream to continuously acquire new knowledge as well as retain the learnt one. Main challenge comes from the "catastrophic forgetting" issue -- the inability to well remember the learnt knowledge while learning the new ones.
arXiv Detail & Related papers (2022-11-10T05:29:43Z)
Improving GANs with A Dynamic Discriminator [106.54552336711997]
We argue that a discriminator with an on-the-fly adjustment on its capacity can better accommodate such a time-varying task. A comprehensive empirical study confirms that the proposed training strategy, termed as DynamicD, improves the synthesis performance without incurring any additional cost or training objectives.
arXiv Detail & Related papers (2022-09-20T17:57:33Z)
Training Strategies for Improved Lip-reading [61.661446956793604]
We investigate the performance of state-of-the-art data augmentation approaches, temporal models and other training strategies. A combination of all the methods results in a classification accuracy of 93.4%, which is an absolute improvement of 4.6% over the current state-of-the-art performance. An error analysis of the various training strategies reveals that the performance improves by increasing the classification accuracy of hard-to-recognise words.
arXiv Detail & Related papers (2022-09-03T09:38:11Z)
Continual Learning For On-Device Environmental Sound Classification [63.81276321857279]
We propose a simple and efficient continual learning method for on-device environmental sound classification. Our method selects the historical data for the training by measuring the per-sample classification uncertainty.
arXiv Detail & Related papers (2022-07-15T12:13:04Z)
An Empirical Study on Distribution Shift Robustness From the Perspective of Pre-Training and Data Augmentation [91.62129090006745]
This paper studies the distribution shift problem from the perspective of pre-training and data augmentation. We provide the first comprehensive empirical study focusing on pre-training and data augmentation.
arXiv Detail & Related papers (2022-05-25T13:04:53Z)
Ask-n-Learn: Active Learning via Reliable Gradient Representations for Image Classification [29.43017692274488]
Deep predictive models rely on human supervision in the form of labeled training data. We propose Ask-n-Learn, an active learning approach based on gradient embeddings obtained using the pesudo-labels estimated in each of the algorithm.
arXiv Detail & Related papers (2020-09-30T05:19:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.