CLIP: Train Faster with Less Data
- URL: http://arxiv.org/abs/2212.01452v2
- Date: Mon, 17 Jul 2023 09:07:25 GMT
- Title: CLIP: Train Faster with Less Data
- Authors: Muhammad Asif Khan, Ridha Hamila, and Hamid Menouar
- Abstract summary: Deep learning models require an enormous amount of data for training.
Recently there is a shift in machine learning from model-centric to data-centric approaches.
We propose CLIP i.e., Curriculum Learning with Iterative data Pruning.
- Score: 3.2575001434344286
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning models require an enormous amount of data for training.
However, recently there is a shift in machine learning from model-centric to
data-centric approaches. In data-centric approaches, the focus is to refine and
improve the quality of the data to improve the learning performance of the
models rather than redesigning model architectures. In this paper, we propose
CLIP i.e., Curriculum Learning with Iterative data Pruning. CLIP combines two
data-centric approaches i.e., curriculum learning and dataset pruning to
improve the model learning accuracy and convergence speed. The proposed scheme
applies loss-aware dataset pruning to iteratively remove the least significant
samples and progressively reduces the size of the effective dataset in the
curriculum learning training. Extensive experiments performed on crowd density
estimation models validate the notion behind combining the two approaches by
reducing the convergence time and improving generalization. To our knowledge,
the idea of data pruning as an embedded process in curriculum learning is
novel.
Related papers
- Attribute-to-Delete: Machine Unlearning via Datamodel Matching [65.13151619119782]
Machine unlearning -- efficiently removing a small "forget set" training data on a pre-divertrained machine learning model -- has recently attracted interest.
Recent research shows that machine unlearning techniques do not hold up in such a challenging setting.
arXiv Detail & Related papers (2024-10-30T17:20:10Z) - Distilled Datamodel with Reverse Gradient Matching [74.75248610868685]
We introduce an efficient framework for assessing data impact, comprising offline training and online evaluation stages.
Our proposed method achieves comparable model behavior evaluation while significantly speeding up the process compared to the direct retraining method.
arXiv Detail & Related papers (2024-04-22T09:16:14Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Exploring Data Redundancy in Real-world Image Classification through
Data Selection [20.389636181891515]
Deep learning models often require large amounts of data for training, leading to increased costs.
We present two data valuation metrics based on Synaptic Intelligence and gradient norms, respectively, to study redundancy in real-world image data.
Online and offline data selection algorithms are then proposed via clustering and grouping based on the examined data values.
arXiv Detail & Related papers (2023-06-25T03:31:05Z) - CILIATE: Towards Fairer Class-based Incremental Learning by Dataset and
Training Refinement [20.591583747291892]
We show that CIL suffers both dataset and algorithm bias problems.
We propose a novel framework, CILIATE, that fixes both dataset and algorithm bias in CIL.
CILIATE improves the fairness of CIL by 17.03%, 22.46%, and 31.79% compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-04-09T12:10:39Z) - Training Efficiency and Robustness in Deep Learning [2.6451769337566406]
We study approaches to improve the training efficiency and robustness of deep learning models.
We find that prioritizing learning on more informative training data increases convergence speed and improves generalization performance on test data.
We show that a redundancy-aware modification to the sampling of training data improves the training speed and develops an efficient method for detecting the diversity of training signal.
arXiv Detail & Related papers (2021-12-02T17:11:33Z) - Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models.
Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.