Related papers: Efficient Learning of Model Weights via Changing Features During Training

Efficient Learning of Model Weights via Changing Features During Training

URL: http://arxiv.org/abs/2002.09249v1
Date: Fri, 21 Feb 2020 12:38:14 GMT
Title: Efficient Learning of Model Weights via Changing Features During Training
Authors: Marcell Beregi-Kov\'acs, \'Agnes Baran and Andr\'as Hajdu
Abstract summary: We propose a machine learning model, which dynamically changes the features during training. Our main motivation is to update the model in a small content during the training process with replacing less descriptive features to new ones from a large pool.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose a machine learning model, which dynamically changes the features during training. Our main motivation is to update the model in a small content during the training process with replacing less descriptive features to new ones from a large pool. The main benefit is coming from the fact that opposite to the common practice we do not start training a new model from the scratch, but can keep the already learned weights. This procedure allows the scan of a large feature pool which together with keeping the complexity of the model leads to an increase of the model accuracy within the same training time. The efficiency of our approach is demonstrated in several classic machine learning scenarios including linear regression and neural network-based training. As a specific analysis towards signal processing, we have successfully tested our approach on the database MNIST for digit classification considering single pixel and pixel-pairs intensities as possible features.

Related papers

Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network) After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference. We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z)
An exactly solvable model for emergence and scaling laws in the multitask sparse parity problem [2.598133279943607]
We present a framework where each new ability (a skill) is represented as a basis function. We find analytic expressions for the emergence of new skills, as well as for scaling laws of the loss with training time, data size, model size, and optimal compute. Our simple model captures, using a single fit parameter, the sigmoidal emergence of multiple new skills as training time, data size or model size increases in the neural network.
arXiv Detail & Related papers (2024-04-26T17:45:32Z)
Initializing Models with Larger Ones [76.41561758293055]
We introduce weight selection, a method for initializing smaller models by selecting a subset of weights from a pretrained larger model. Our experiments demonstrate that weight selection can significantly enhance the performance of small models and reduce their training time.
arXiv Detail & Related papers (2023-11-30T18:58:26Z)
EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones [80.662250618795]
This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers) As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models by >1.5x on ImageNet-1K/22K without sacrificing accuracy.
arXiv Detail & Related papers (2022-11-17T17:38:55Z)
A Simple Baseline that Questions the Use of Pretrained-Models in Continual Learning [30.023047201419825]
Some methods design continual learning mechanisms on the pre-trained representations and only allow minimum updates or even no updates of the backbone models during the training of continual learning. We argue that the pretrained feature extractor itself can be strong enough to achieve a competitive or even better continual learning performance on Split-CIFAR100 and CoRe 50 benchmarks. This baseline achieved 88.53% on 10-Split-CIFAR-100, surpassing most state-of-the-art continual learning methods that are all using the same pretrained transformer model.
arXiv Detail & Related papers (2022-10-10T04:19:53Z)
STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data. Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
Learning to Reweight with Deep Interactions [104.68509759134878]
We propose an improved data reweighting algorithm, in which the student model provides its internal states to the teacher model. Experiments on image classification with clean/noisy labels and neural machine translation empirically demonstrate that our algorithm makes significant improvement over previous methods.
arXiv Detail & Related papers (2020-07-09T09:06:31Z)
Neural Network Retraining for Model Serving [32.857847595096025]
We propose incremental (re)training of a neural network model to cope with a continuous flow of new data in inference. We address two challenges of life-long retraining: catastrophic forgetting and efficient retraining.
arXiv Detail & Related papers (2020-04-29T13:52:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.