Related papers: FastDINOv2: Frequency Based Curriculum Learning Improves Robustness and Training Speed

FastDINOv2: Frequency Based Curriculum Learning Improves Robustness and Training Speed

URL: http://arxiv.org/abs/2507.03779v1
Date: Fri, 04 Jul 2025 18:56:04 GMT
Title: FastDINOv2: Frequency Based Curriculum Learning Improves Robustness and Training Speed
Authors: Jiaqi Zhang, Juntuo Wang, Zhixin Sun, John Zou, Randall Balestriero,
Abstract summary: Large-scale vision foundation models such as DINOv2 boast impressive performances by leveraging massive architectures and training datasets.<n>We propose a novel pre-training strategy for DINOv2 that simultaneously accelerates convergence and strengthens robustness to common corruptions as a by-product.
Score: 14.677270805094311
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large-scale vision foundation models such as DINOv2 boast impressive performances by leveraging massive architectures and training datasets. But numerous scenarios require practitioners to reproduce those pre-training solutions, such as on private data, new modalities, or simply for scientific questioning--which is currently extremely demanding computation-wise. We thus propose a novel pre-training strategy for DINOv2 that simultaneously accelerates convergence--and strengthens robustness to common corruptions as a by-product. Our approach involves a frequency filtering curriculum--low-frequency being seen first--and the Gaussian noise patching augmentation. Applied to a ViT-B/16 backbone trained on ImageNet-1K, while pre-training time and FLOPs are reduced by 1.6x and 2.25x, our method still achieves matching robustness in corruption benchmarks (ImageNet-C) and maintains competitive linear probing performance compared with baseline. This dual benefit of efficiency and robustness makes large-scale self-supervised foundation modeling more attainable, while opening the door to novel exploration around data curriculum and augmentation as means to improve self-supervised learning models robustness. The code is available at https://github.com/KevinZ0217/fast_dinov2

Related papers

Improving Deep Knowledge Tracing via Gated Architectures and Adaptive Optimization [0.0]
Deep Knowledge Tracing (DKT) models student learning behavior by using Recurrent Networks (RNNs) to predict future performance based on historical interaction data.<n>In this work, we revisit the DKT model from two perspectives: architectural improvements and optimization.<n>First, we enhance the model using gated recurrent units, specifically Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRU)<n>Second, we re-implement DKT using the PyTorch framework, enabling a modular and accessible infrastructure compatible with modern deep learning.
arXiv Detail & Related papers (2025-04-24T14:24:31Z)
Always-Sparse Training by Growing Connections with Guided Stochastic Exploration [43.26615926465987]
We propose an efficient always-sparse training algorithm with excellent scaling to larger and sparser models.<n>We evaluate our method on CIFAR-10/100 and ImageNet using VGG, and ViT models, and compare it against a range of sparsification methods.
arXiv Detail & Related papers (2024-01-12T21:32:04Z)
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training [17.158498267947877]
We introduce MobileCLIP, a new family of efficient image-text models optimized for runtime performance. MobileCLIP uses knowledge transfer from an image captioning model and an ensemble of strong CLIP encoders to improve the accuracy of efficient models. Our approach avoids train-time compute overhead by storing the additional knowledge in a reinforced dataset.
arXiv Detail & Related papers (2023-11-28T18:55:42Z)
TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks. We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework. TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z)
EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones [80.662250618795]
This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers) As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models by >1.5x on ImageNet-1K/22K without sacrificing accuracy.
arXiv Detail & Related papers (2022-11-17T17:38:55Z)
Sparsity Winning Twice: Better Robust Generalization from More Efficient Training [94.92954973680914]
We introduce two alternatives for sparse adversarial training: (i) static sparsity and (ii) dynamic sparsity. We find both methods to yield win-win: substantially shrinking the robust generalization gap and alleviating the robust overfitting. Our approaches can be combined with existing regularizers, establishing new state-of-the-art results in adversarial training.
arXiv Detail & Related papers (2022-02-20T15:52:08Z)
An Empirical Analysis of Recurrent Learning Algorithms In Neural Lossy Image Compression Systems [73.48927855855219]
Recent advances in deep learning have resulted in image compression algorithms that outperform JPEG and JPEG 2000 on the standard Kodak benchmark. In this paper, we perform the first large-scale comparison of recent state-of-the-art hybrid neural compression algorithms.
arXiv Detail & Related papers (2022-01-27T19:47:51Z)
Investigating Tradeoffs in Real-World Video Super-Resolution [90.81396836308085]
Real-world video super-resolution (VSR) models are often trained with diverse degradations to improve generalizability. To alleviate the first tradeoff, we propose a degradation scheme that reduces up to 40% of training time without sacrificing performance. To facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences.
arXiv Detail & Related papers (2021-11-24T18:58:21Z)
Improved Adversarial Training via Learned Optimizer [101.38877975769198]
We propose a framework to improve the robustness of adversarial training models. By co-training's parameters model's weights, the proposed framework consistently improves robustness and steps adaptively for update directions.
arXiv Detail & Related papers (2020-04-25T20:15:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.