Boost AI Power: Data Augmentation Strategies with unlabelled Data and
Conformal Prediction, a Case in Alternative Herbal Medicine Discrimination
with Electronic Nose
- URL: http://arxiv.org/abs/2102.03088v1
- Date: Fri, 5 Feb 2021 10:25:36 GMT
- Title: Boost AI Power: Data Augmentation Strategies with unlabelled Data and
Conformal Prediction, a Case in Alternative Herbal Medicine Discrimination
with Electronic Nose
- Authors: Li Liu, Xianghao Zhan, Rumeng Wu, Xiaoqing Guan, Zhan Wang, Wei Zhang,
You Wang, Zhiyuan Luo, Guang Li
- Abstract summary: Electronic nose proves its effectiveness in alternativeherbal medicine classification, but due to the supervised learn-costing nature, previous research relies on the labelled training data.
This study aims to improve classification accuracy via data augmentationstrategies.
- Score: 12.31253329379136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Electronic nose proves its effectiveness in alternativeherbal medicine
classification, but due to the supervised learn-ing nature, previous research
relies on the labelled training data,which are time-costly and labor-intensive
to collect. Consideringthe training data inadequacy in real-world applications,
this studyaims to improve classification accuracy via data
augmentationstrategies. We stimulated two scenarios to investigate the
effective-ness of five data augmentation strategies under different
trainingdata inadequacy: in the noise-free scenario, different availability
ofunlabelled data were simulated, and in the noisy scenario, differentlevels of
Gaussian noises and translational shifts were added tosimulate sensor drifts.
The augmentation strategies: noise-addingdata augmentation, semi-supervised
learning, classifier-based online learning, inductive conformal prediction
(ICP) onlinelearning and the novel ensemble ICP online learning proposed in
this study, were compared against supervised learningbaseline, with Linear
Discriminant Analysis (LDA) and Support Vector Machine (SVM) as the
classifiers. We found thatat least one strategies significantly improved the
classification accuracy with LDA(p<=0.05) and showed
non-decreasingclassification accuracy with SVM in each tasks. Moreover, our
novel strategy: ensemble ICP online learning outperformedthe others by showing
non-decreasing classification accuracy on all tasks and significant improvement
on most tasks(25/36 tasks,p<=0.05). This study provides a systematic analysis
over augmentation strategies, and we provided userswith recommended strategies
under specific circumstances. Furthermore, our newly proposed strategy showed
botheffectiveness and robustness in boosting the classification model
generalizability, which can also be further employed inother machine learning
applications.
Related papers
- Systematic Evaluation of Synthetic Data Augmentation for Multi-class NetFlow Traffic [2.5182419298876857]
Multi-class classification models can identify specific types of attacks, allowing for more targeted and effective incident responses.
Recent advances suggest that generative models can assist in data augmentation, claiming to offer superior solutions for imbalanced datasets.
Our experiments indicate that resampling methods for balancing training data do not reliably improve classification performance.
arXiv Detail & Related papers (2024-08-28T12:44:07Z) - Graph Transductive Defense: a Two-Stage Defense for Graph Membership Inference Attacks [50.19590901147213]
Graph neural networks (GNNs) have become instrumental in diverse real-world applications, offering powerful graph learning capabilities.
GNNs are vulnerable to adversarial attacks, including membership inference attacks (MIA)
This paper proposes an effective two-stage defense, Graph Transductive Defense (GTD), tailored to graph transductive learning characteristics.
arXiv Detail & Related papers (2024-06-12T06:36:37Z) - Which Augmentation Should I Use? An Empirical Investigation of Augmentations for Self-Supervised Phonocardiogram Representation Learning [5.438725298163702]
Contrastive Self-Supervised Learning (SSL) offers a potential solution to labeled data scarcity.
We propose uncovering the optimal augmentations for applying contrastive learning in 1D phonocardiogram (PCG) classification.
We demonstrate that depending on its training distribution, the effectiveness of a fully-supervised model can degrade up to 32%, while SSL models only lose up to 10% or even improve in some cases.
arXiv Detail & Related papers (2023-12-01T11:06:00Z) - NTKCPL: Active Learning on Top of Self-Supervised Model by Estimating
True Coverage [3.4806267677524896]
We propose a novel active learning strategy, neural tangent kernel clustering-pseudo-labels (NTKCPL)
It estimates empirical risk based on pseudo-labels and the model prediction with NTK approximation.
We validate our method on five datasets, empirically demonstrating that it outperforms the baseline methods in most cases.
arXiv Detail & Related papers (2023-06-07T01:43:47Z) - Variance-Reduced Gradient Estimation via Noise-Reuse in Online Evolution
Strategies [50.10277748405355]
Noise-Reuse Evolution Strategies (NRES) is a general class of unbiased online evolution strategies methods.
We show NRES results in faster convergence than existing AD and ES methods in terms of wall-clock time and number of steps across a variety of applications.
arXiv Detail & Related papers (2023-04-21T17:53:05Z) - Mitigating Forgetting in Online Continual Learning via Contrasting
Semantically Distinct Augmentations [22.289830907729705]
Online continual learning (OCL) aims to enable model learning from a non-stationary data stream to continuously acquire new knowledge as well as retain the learnt one.
Main challenge comes from the "catastrophic forgetting" issue -- the inability to well remember the learnt knowledge while learning the new ones.
arXiv Detail & Related papers (2022-11-10T05:29:43Z) - Improving GANs with A Dynamic Discriminator [106.54552336711997]
We argue that a discriminator with an on-the-fly adjustment on its capacity can better accommodate such a time-varying task.
A comprehensive empirical study confirms that the proposed training strategy, termed as DynamicD, improves the synthesis performance without incurring any additional cost or training objectives.
arXiv Detail & Related papers (2022-09-20T17:57:33Z) - Training Strategies for Improved Lip-reading [61.661446956793604]
We investigate the performance of state-of-the-art data augmentation approaches, temporal models and other training strategies.
A combination of all the methods results in a classification accuracy of 93.4%, which is an absolute improvement of 4.6% over the current state-of-the-art performance.
An error analysis of the various training strategies reveals that the performance improves by increasing the classification accuracy of hard-to-recognise words.
arXiv Detail & Related papers (2022-09-03T09:38:11Z) - Continual Learning For On-Device Environmental Sound Classification [63.81276321857279]
We propose a simple and efficient continual learning method for on-device environmental sound classification.
Our method selects the historical data for the training by measuring the per-sample classification uncertainty.
arXiv Detail & Related papers (2022-07-15T12:13:04Z) - An Empirical Study on Distribution Shift Robustness From the Perspective
of Pre-Training and Data Augmentation [91.62129090006745]
This paper studies the distribution shift problem from the perspective of pre-training and data augmentation.
We provide the first comprehensive empirical study focusing on pre-training and data augmentation.
arXiv Detail & Related papers (2022-05-25T13:04:53Z) - Ask-n-Learn: Active Learning via Reliable Gradient Representations for
Image Classification [29.43017692274488]
Deep predictive models rely on human supervision in the form of labeled training data.
We propose Ask-n-Learn, an active learning approach based on gradient embeddings obtained using the pesudo-labels estimated in each of the algorithm.
arXiv Detail & Related papers (2020-09-30T05:19:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.