Related papers: Efficient Contrastive Learning via Novel Data Augmentation and Curriculum Learning

Efficient Contrastive Learning via Novel Data Augmentation and Curriculum Learning

URL: http://arxiv.org/abs/2109.05941v1
Date: Fri, 10 Sep 2021 05:49:55 GMT
Title: Efficient Contrastive Learning via Novel Data Augmentation and Curriculum Learning
Authors: Seonghyeon Ye, Jiseon Kim, Alice Oh
Abstract summary: We introduce EfficientCL, a memory-efficient continual pretraining method. For data augmentation, we stack two types of operation sequentially: cutoff and PCA jittering. While pretraining steps proceed, we apply curriculum learning by incrementing the augmentation degree for each difficulty step.
Score: 11.138005656807968
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce EfficientCL, a memory-efficient continual pretraining method that applies contrastive learning with novel data augmentation and curriculum learning. For data augmentation, we stack two types of operation sequentially: cutoff and PCA jittering. While pretraining steps proceed, we apply curriculum learning by incrementing the augmentation degree for each difficulty step. After data augmentation is finished, contrastive learning is applied on projected embeddings of original and augmented examples. When finetuned on GLUE benchmark, our model outperforms baseline models, especially for sentence-level tasks. Additionally, this improvement is capable with only 70% of computational memory compared to the baseline model.

Related papers

AILS-NTUA at SemEval-2025 Task 4: Parameter-Efficient Unlearning for Large Language Models using Data Chunking [5.535042121804845]
We leverage parameter-efficient, gradient-based unlearning using low-rank adaptation and layer-focused fine-tuning. We employ data chunking, splitting forget data into disjoint partitions and merging them with cyclically sampled retain samples at a pre-defined ratio.
arXiv Detail & Related papers (2025-03-04T09:39:09Z)
Online Gradient Boosting Decision Tree: In-Place Updates for Efficient Adding/Deleting Data [18.21562008536426]
We propose an efficient online learning framework for GBDT supporting both incremental and decremental learning. To reduce the learning cost, we present a collection of optimizations for our framework, so that it can add or delete a small fraction of data on the fly. Backdoor attack results show that our framework can successfully inject and remove backdoor in a well-trained model.
arXiv Detail & Related papers (2025-02-03T18:59:04Z)
Reducing catastrophic forgetting of incremental learning in the absence of rehearsal memory with task-specific token [0.6144680854063939]
Deep learning models display catastrophic forgetting when learning new data continuously. We present a novel method that preserves previous knowledge without storing previous data. This method is inspired by the architecture of a vision transformer and employs a unique token capable of encapsulating the compressed knowledge of each task.
arXiv Detail & Related papers (2024-11-06T16:13:50Z)
Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review [50.78587571704713]
Learn-Focus-Review (LFR) is a dynamic training approach that adapts to the model's learning progress. LFR tracks the model's learning performance across data blocks (sequences of tokens) and prioritizes revisiting challenging regions of the dataset. Compared to baseline models trained on the full datasets, LFR consistently achieved lower perplexity and higher accuracy.
arXiv Detail & Related papers (2024-09-10T00:59:18Z)
Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning. Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset. We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU) We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z)
Efficient Grammatical Error Correction Via Multi-Task Training and Optimized Training Schedule [55.08778142798106]
We propose auxiliary tasks that exploit the alignment between the original and corrected sentences. We formulate each task as a sequence-to-sequence problem and perform multi-task training. We find that the order of datasets used for training and even individual instances within a dataset may have important effects on the final performance.
arXiv Detail & Related papers (2023-11-20T14:50:12Z)
Unlearn What You Want to Forget: Efficient Unlearning for LLMs [92.51670143929056]
Large language models (LLMs) have achieved significant progress from pre-training on and memorizing a wide range of textual data. This process might suffer from privacy issues and violations of data protection regulations. We propose an efficient unlearning framework that could efficiently update LLMs without having to retrain the whole model after data removals.
arXiv Detail & Related papers (2023-10-31T03:35:59Z)
CLIP: Train Faster with Less Data [3.2575001434344286]
Deep learning models require an enormous amount of data for training. Recently there is a shift in machine learning from model-centric to data-centric approaches. We propose CLIP i.e., Curriculum Learning with Iterative data Pruning.
arXiv Detail & Related papers (2022-12-02T21:29:48Z)
A Memory Transformer Network for Incremental Learning [64.0410375349852]
We study class-incremental learning, a training setup in which new classes of data are observed over time for the model to learn from. Despite the straightforward problem formulation, the naive application of classification models to class-incremental learning results in the "catastrophic forgetting" of previously seen classes. One of the most successful existing methods has been the use of a memory of exemplars, which overcomes the issue of catastrophic forgetting by saving a subset of past data into a memory bank and utilizing it to prevent forgetting when training future tasks.
arXiv Detail & Related papers (2022-10-10T08:27:28Z)
BERT WEAVER: Using WEight AVERaging to enable lifelong learning for transformer-based models in biomedical semantic search engines [49.75878234192369]
We present WEAVER, a simple, yet efficient post-processing method that infuses old knowledge into the new model. We show that applying WEAVER in a sequential manner results in similar word embedding distributions as doing a combined training on all data at once.
arXiv Detail & Related papers (2022-02-21T10:34:41Z)
LiST: Lite Self-training Makes Efficient Few-shot Learners [91.28065455714018]
LiST improves by 35% over classic fine-tuning methods and 6% over prompt-tuning with 96% reduction in number of trainable parameters when fine-tuned with no more than 30 labeled examples from each target domain.
arXiv Detail & Related papers (2021-10-12T18:47:18Z)
Pretraining Representations for Data-Efficient Reinforcement Learning [12.43475487724972]
We use unlabeled data to pretrain an encoder which is then finetuned on a small amount of task-specific data. When limited to 100k steps of interaction on Atari games, our approach significantly surpasses prior work. Our approach shows particular promise when combined with larger models as well as more diverse, task-aligned observational data.
arXiv Detail & Related papers (2021-06-09T04:14:27Z)
ProtoDA: Efficient Transfer Learning for Few-Shot Intent Classification [21.933876113300897]
We adopt an alternative approach by transfer learning on an ensemble of related tasks using prototypical networks under the meta-learning paradigm. Using intent classification as a case study, we demonstrate that increasing variability in training tasks can significantly improve classification performance.
arXiv Detail & Related papers (2021-01-28T00:19:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.