Related papers: CLOFAI: A Dataset of Real And Fake Image Classification Tasks for Continual Learning

CLOFAI: A Dataset of Real And Fake Image Classification Tasks for Continual Learning

URL: http://arxiv.org/abs/2501.11140v1
Date: Sun, 19 Jan 2025 18:53:30 GMT
Title: CLOFAI: A Dataset of Real And Fake Image Classification Tasks for Continual Learning
Authors: William Doherty, Anton Lee, Heitor Murilo Gomes,
Abstract summary: We introduce a new dataset called CLOFAI (Continual Learning On Fake and Authentic Images)<n>It takes the form of a domain-incremental image classification problem.<n>In doing this, we set a baseline on our novel dataset using three foundational continual learning methods.
Score: 1.7256001727746018
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rapid advancement of generative AI models capable of creating realistic media has led to a need for classifiers that can accurately distinguish between genuine and artificially-generated images. A significant challenge for these classifiers emerges when they encounter images from generative models that are not represented in their training data, usually resulting in diminished performance. A typical approach is to periodically update the classifier's training data with images from the new generative models then retrain the classifier on the updated dataset. However, in some real-life scenarios, storage, computational, or privacy constraints render this approach impractical. Additionally, models used in security applications may be required to rapidly adapt. In these circumstances, continual learning provides a promising alternative, as the classifier can be updated without retraining on the entire dataset. In this paper, we introduce a new dataset called CLOFAI (Continual Learning On Fake and Authentic Images), which takes the form of a domain-incremental image classification problem. Moreover, we showcase the applicability of this dataset as a benchmark for evaluating continual learning methodologies. In doing this, we set a baseline on our novel dataset using three foundational continual learning methods -- EWC, GEM, and Experience Replay -- and find that EWC performs poorly, while GEM and Experience Replay show promise, performing significantly better than a Naive baseline. The dataset and code to run the experiments can be accessed from the following GitHub repository: https://github.com/Will-Doherty/CLOFAI.

Related papers

Your Image Generator Is Your New Private Dataset [4.09225917049674]
Generative diffusion models have emerged as powerful tools to synthetically produce training data. This paper proposes the Text-Conditioned Knowledge Recycling pipeline to tackle these challenges. The pipeline is rigorously evaluated on ten diverse image classification benchmarks.
arXiv Detail & Related papers (2025-04-06T18:46:08Z)
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification [34.37262622415682]
We propose a new adaptation framework called Data Adaptive Traceback. Specifically, we utilize a zero-shot-based method to extract the most downstream task-related subset of the pre-training data. We adopt a pseudo-label-based semi-supervised technique to reuse the pre-training images and a vision-language contrastive learning method to address the confirmation bias issue in semi-supervised learning.
arXiv Detail & Related papers (2024-07-11T18:01:58Z)
Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images. We identify model weaknesses by testing the model using the counterfactual image dataset. We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z)
Premonition: Using Generative Models to Preempt Future Data Changes in Continual Learning [63.850451635362425]
Continual learning requires a model to adapt to ongoing changes in the data distribution. We show that the combination of a large language model and an image generation model can similarly provide useful premonitions. We find that the backbone of our pre-trained networks can learn representations useful for the downstream continual learning problem.
arXiv Detail & Related papers (2024-03-12T06:29:54Z)
Active Generation for Image Classification [45.93535669217115]
We propose to address the efficiency of image generation by focusing on the specific needs and characteristics of the model. With a central tenet of active learning, our method, named ActGen, takes a training-aware approach to image generation.
arXiv Detail & Related papers (2024-03-11T08:45:31Z)
Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks. Such models tend to be large and require commensurate volumes of training data. It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs. Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z)
ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP) ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective. We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z)
Leaving Reality to Imagination: Robust Classification via Generated Datasets [24.411444438920988]
Recent research on robustness has revealed significant performance gaps between neural image classifiers trained on datasets similar to the test set. We study the question: How do generated datasets influence the natural robustness of image classifiers? We find that Imagenet classifiers trained on real data augmented with generated data achieve higher accuracy and effective robustness than standard training.
arXiv Detail & Related papers (2023-02-05T22:49:33Z)
Prefix Conditioning Unifies Language and Label Supervision [84.11127588805138]
We show that dataset biases negatively affect pre-training by reducing the generalizability of learned representations. In experiments, we show that this simple technique improves the performance in zero-shot image recognition accuracy and robustness to the image-level distribution shift.
arXiv Detail & Related papers (2022-06-02T16:12:26Z)
The CLEAR Benchmark: Continual LEArning on Real-World Imagery [77.98377088698984]
Continual learning (CL) is widely regarded as crucial challenge for lifelong AI. We introduce CLEAR, the first continual image classification benchmark dataset with a natural temporal evolution of visual concepts. We find that a simple unsupervised pre-training step can already boost state-of-the-art CL algorithms.
arXiv Detail & Related papers (2022-01-17T09:09:09Z)
Move-to-Data: A new Continual Learning approach with Deep CNNs, Application for image-class recognition [0.0]
It is necessary to pre-train the model at a "training recording phase" and then adjust it to the new coming data. We propose a fast continual learning layer at the end of the neuronal network.
arXiv Detail & Related papers (2020-06-12T13:04:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.