Producing Plankton Classifiers that are Robust to Dataset Shift
- URL: http://arxiv.org/abs/2401.14256v1
- Date: Thu, 25 Jan 2024 15:47:18 GMT
- Title: Producing Plankton Classifiers that are Robust to Dataset Shift
- Authors: Cheng Chen, Sreenath Kyathanahally, Marta Reyes, Stefanie Merkli, Ewa
Merz, Emanuele Francazi, Marvin Hoege, Francesco Pomati, Marco Baity-Jesi
- Abstract summary: We integrate ZooLake dataset with manually-annotated images from 10 independent days of deployment to benchmark Out-Of-Dataset (OOD) performances.
We propose a preemptive assessment method to identify potential pitfalls when classifying new data, and pinpoint features in OOD images that adversely impact classification.
We find that ensembles of BEiT vision transformers, with targeted augmentations addressing OOD robustness, geometric ensembling, and rotation-based test-time augmentation, constitute the most robust model, which we call BEsT model.
- Score: 1.716364772047407
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern plankton high-throughput monitoring relies on deep learning
classifiers for species recognition in water ecosystems. Despite satisfactory
nominal performances, a significant challenge arises from Dataset Shift, which
causes performances to drop during deployment. In our study, we integrate the
ZooLake dataset with manually-annotated images from 10 independent days of
deployment, serving as test cells to benchmark Out-Of-Dataset (OOD)
performances. Our analysis reveals instances where classifiers, initially
performing well in In-Dataset conditions, encounter notable failures in
practical scenarios. For example, a MobileNet with a 92% nominal test accuracy
shows a 77% OOD accuracy. We systematically investigate conditions leading to
OOD performance drops and propose a preemptive assessment method to identify
potential pitfalls when classifying new data, and pinpoint features in OOD
images that adversely impact classification. We present a three-step pipeline:
(i) identifying OOD degradation compared to nominal test performance, (ii)
conducting a diagnostic analysis of degradation causes, and (iii) providing
solutions. We find that ensembles of BEiT vision transformers, with targeted
augmentations addressing OOD robustness, geometric ensembling, and
rotation-based test-time augmentation, constitute the most robust model, which
we call BEsT model. It achieves an 83% OOD accuracy, with errors concentrated
on container classes. Moreover, it exhibits lower sensitivity to dataset shift,
and reproduces well the plankton abundances. Our proposed pipeline is
applicable to generic plankton classifiers, contingent on the availability of
suitable test cells. By identifying critical shortcomings and offering
practical procedures to fortify models against dataset shift, our study
contributes to the development of more reliable plankton classification
technologies.
Related papers
- Open-World Test-Time Adaptation with Hierarchical Feature Aggregation and Attention Affine [17.151364853811128]
Test-time adaptation (TTA) refers to adjusting the model during the testing phase to cope with changes in sample distribution.<n>We propose a Hierarchical Ladder Network that extracts OOD features from class tokens aggregated across all Transformer layers.<n>We also introduce an Attention Affine Network (AAN) that adaptively refines the self-attention mechanism conditioned on the token information to better adapt to domain drift.
arXiv Detail & Related papers (2025-11-16T14:05:23Z) - Redundancy-Aware Test-Time Graph Out-of-Distribution Detection [20.560483914725435]
RedOUT is an unsupervised framework that integrates structural entropy into test-time OOD detection for graph classification.<n>Our method achieves an average improvement of 6.7%, significantly surpassing the best competitor by 17.3% on the ClinTox/LIPO dataset pair.
arXiv Detail & Related papers (2025-10-16T11:14:45Z) - RoHOI: Robustness Benchmark for Human-Object Interaction Detection [84.78366452133514]
Human-Object Interaction (HOI) detection is crucial for robot-human assistance, enabling context-aware support.<n>We introduce the first benchmark for HOI detection, evaluating model resilience under diverse challenges.<n>Our benchmark, RoHOI, includes 20 corruption types based on the HICO-DET and V-COCO datasets and a new robustness-focused metric.
arXiv Detail & Related papers (2025-07-12T01:58:04Z) - Improving Omics-Based Classification: The Role of Feature Selection and Synthetic Data Generation [0.18846515534317262]
This study presents a machine learning based classification framework that integrates feature selection with data augmentation techniques.<n>We show that the proposed pipeline yields cross validated perfomance on small dataset.
arXiv Detail & Related papers (2025-05-06T10:09:50Z) - Out-of-Distribution Detection using Synthetic Data Generation [21.612592503592143]
In- and out-of-distribution (OOD) inputs are crucial for reliable deployment of classification systems.
We present a method that harnesses the generative capabilities of Large Language Models (LLMs) to create high-quality synthetic OOD proxies.
arXiv Detail & Related papers (2025-02-05T16:22:09Z) - Early-Stage Anomaly Detection: A Study of Model Performance on Complete vs. Partial Flows [0.0]
This study investigates the efficacy of machine learning models in network anomaly detection through the critical lens of partial versus complete flow information.
We demonstrate a significant performance difference when models trained on complete flows are tested against partial flows.
The study reveals that a minimum of 7 packets in the test set is required for maintaining reliable detection rates.
arXiv Detail & Related papers (2024-07-03T07:14:25Z) - Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving [55.93813178692077]
We present RoboBEV, an extensive benchmark suite designed to evaluate the resilience of BEV algorithms.
We assess 33 state-of-the-art BEV-based perception models spanning tasks like detection, map segmentation, depth estimation, and occupancy prediction.
Our experimental results also underline the efficacy of strategies like pre-training and depth-free BEV transformations in enhancing robustness against out-of-distribution data.
arXiv Detail & Related papers (2024-05-27T17:59:39Z) - Which Augmentation Should I Use? An Empirical Investigation of Augmentations for Self-Supervised Phonocardiogram Representation Learning [5.438725298163702]
Contrastive Self-Supervised Learning (SSL) offers a potential solution to labeled data scarcity.
We propose uncovering the optimal augmentations for applying contrastive learning in 1D phonocardiogram (PCG) classification.
We demonstrate that depending on its training distribution, the effectiveness of a fully-supervised model can degrade up to 32%, while SSL models only lose up to 10% or even improve in some cases.
arXiv Detail & Related papers (2023-12-01T11:06:00Z) - Exploring the Physical World Adversarial Robustness of Vehicle Detection [13.588120545886229]
Adrial attacks can compromise the robustness of real-world detection models.
We propose an innovative instant-level data generation pipeline using the CARLA simulator.
Our findings highlight diverse model performances under adversarial conditions.
arXiv Detail & Related papers (2023-08-07T11:09:12Z) - AUTO: Adaptive Outlier Optimization for Test-Time OOD Detection [79.51071170042972]
Out-of-distribution (OOD) detection aims to detect test samples that do not fall into any training in-distribution (ID) classes.<n>Data safety and privacy make it infeasible to collect task-specific outliers in advance for different scenarios.<n>We present test-time OOD detection, which allows the deployed model to utilize real OOD data from the unlabeled data stream during testing.
arXiv Detail & Related papers (2023-03-22T02:28:54Z) - Energy-based Out-of-Distribution Detection for Graph Neural Networks [76.0242218180483]
We propose a simple, powerful and efficient OOD detection model for GNN-based learning on graphs, which we call GNNSafe.
GNNSafe achieves up to $17.0%$ AUROC improvement over state-of-the-arts and it could serve as simple yet strong baselines in such an under-developed area.
arXiv Detail & Related papers (2023-02-06T16:38:43Z) - Temporal Output Discrepancy for Loss Estimation-based Active Learning [65.93767110342502]
We present a novel deep active learning approach that queries the oracle for data annotation when the unlabeled sample is believed to incorporate high loss.
Our approach achieves superior performances than the state-of-the-art active learning methods on image classification and semantic segmentation tasks.
arXiv Detail & Related papers (2022-12-20T19:29:37Z) - CausalAgents: A Robustness Benchmark for Motion Forecasting using Causal
Relationships [8.679073301435265]
We construct a new benchmark for evaluating and improving model robustness by applying perturbations to existing data.
We use these labels to perturb the data by deleting non-causal agents from the scene.
Under non-causal perturbations, we observe a $25$-$38%$ relative change in minADE as compared to the original.
arXiv Detail & Related papers (2022-07-07T21:28:23Z) - Efficient Test-Time Model Adaptation without Forgetting [60.36499845014649]
Test-time adaptation seeks to tackle potential distribution shifts between training and testing data.
We propose an active sample selection criterion to identify reliable and non-redundant samples.
We also introduce a Fisher regularizer to constrain important model parameters from drastic changes.
arXiv Detail & Related papers (2022-04-06T06:39:40Z) - Understanding and Testing Generalization of Deep Networks on
Out-of-Distribution Data [30.471871571256198]
Deep network models perform excellently on In-Distribution data, but can significantly fail on Out-Of-Distribution data.
This study is devoted to analyzing the problem of experimental ID test and designing OOD test paradigm.
arXiv Detail & Related papers (2021-11-17T15:29:07Z) - Towards Reducing Labeling Cost in Deep Object Detection [61.010693873330446]
We propose a unified framework for active learning, that considers both the uncertainty and the robustness of the detector.
Our method is able to pseudo-label the very confident predictions, suppressing a potential distribution drift.
arXiv Detail & Related papers (2021-06-22T16:53:09Z) - Learn what you can't learn: Regularized Ensembles for Transductive
Out-of-distribution Detection [76.39067237772286]
We show that current out-of-distribution (OOD) detection algorithms for neural networks produce unsatisfactory results in a variety of OOD detection scenarios.
This paper studies how such "hard" OOD scenarios can benefit from adjusting the detection method after observing a batch of the test data.
We propose a novel method that uses an artificial labeling scheme for the test data and regularization to obtain ensembles of models that produce contradictory predictions only on the OOD samples in a test batch.
arXiv Detail & Related papers (2020-12-10T16:55:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.