Ultra-light deep MIR by trimming lottery tickets
- URL: http://arxiv.org/abs/2007.16187v1
- Date: Fri, 31 Jul 2020 17:30:28 GMT
- Title: Ultra-light deep MIR by trimming lottery tickets
- Authors: Philippe Esling, Theis Bazin, Adrien Bitton, Tristan Carsault, Ninon
Devis
- Abstract summary: We propose a model pruning method based on the lottery ticket hypothesis.
We show that our proposal can remove up to 90% of the model parameters without loss of accuracy.
We confirm the surprising result that, at smaller compression ratios, lighter models consistently outperform their heavier counterparts.
- Score: 1.2599533416395767
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Current state-of-the-art results in Music Information Retrieval are largely
dominated by deep learning approaches. These provide unprecedented accuracy
across all tasks. However, the consistently overlooked downside of these models
is their stunningly massive complexity, which seems concomitantly crucial to
their success. In this paper, we address this issue by proposing a model
pruning method based on the lottery ticket hypothesis. We modify the original
approach to allow for explicitly removing parameters, through structured
trimming of entire units, instead of simply masking individual weights. This
leads to models which are effectively lighter in terms of size, memory and
number of operations. We show that our proposal can remove up to 90% of the
model parameters without loss of accuracy, leading to ultra-light deep MIR
models. We confirm the surprising result that, at smaller compression ratios
(removing up to 85% of a network), lighter models consistently outperform their
heavier counterparts. We exhibit these results on a large array of MIR tasks
including audio classification, pitch recognition, chord extraction, drum
transcription and onset estimation. The resulting ultra-light deep learning
models for MIR can run on CPU, and can even fit on embedded devices with
minimal degradation of accuracy.
Related papers
- LoRA vs Full Fine-tuning: An Illusion of Equivalence [76.11938177294178]
We study how different fine-tuning methods change pre-trained models by analyzing the model's weight matrices through the lens of their spectral properties.
We find that full fine-tuning and LoRA yield weight matrices whose singular value decompositions exhibit very different structure.
We conclude by examining why intruder dimensions appear in LoRA fine-tuned models, why they are undesirable, and how their effects can be minimized.
arXiv Detail & Related papers (2024-10-28T17:14:01Z) - Large Language Model Pruning [0.0]
We suggest a model pruning technique specifically focused on LLMs.
The proposed methodology emphasizes the explainability of deep learning models.
We also explore the difference between pruning on large-scale models vs. pruning on small-scale models.
arXiv Detail & Related papers (2024-05-24T18:22:15Z) - Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment [56.44025052765861]
Large language models (LLMs) have revolutionized Natural Language Processing (NLP), but their size creates computational bottlenecks.
We introduce a novel approach to create accurate, sparse foundational versions of performant LLMs.
We show a total speedup on CPUs for sparse-quantized LLaMA models of up to 8.6x.
arXiv Detail & Related papers (2024-05-06T16:03:32Z) - Quantifying lottery tickets under label noise: accuracy, calibration,
and complexity [6.232071870655069]
Pruning deep neural networks is a widely used strategy to alleviate the computational burden in machine learning.
We use the sparse double descent approach to identify univocally and characterise pruned models associated with classification tasks.
arXiv Detail & Related papers (2023-06-21T11:35:59Z) - CrAM: A Compression-Aware Minimizer [103.29159003723815]
We propose a new compression-aware minimizer dubbed CrAM that modifies the optimization step in a principled way.
CrAM produces dense models that can be more accurate than the standard SGD/Adam-based baselines, but which are stable under weight pruning.
CrAM can produce sparse models which perform well for transfer learning, and it also works for semi-structured 2:4 pruning patterns supported by GPU hardware.
arXiv Detail & Related papers (2022-07-28T16:13:28Z) - PLATON: Pruning Large Transformer Models with Upper Confidence Bound of
Weight Importance [114.1541203743303]
We propose PLATON, which captures the uncertainty of importance scores by upper confidence bound (UCB) of importance estimation.
We conduct extensive experiments with several Transformer-based models on natural language understanding, question answering and image classification.
arXiv Detail & Related papers (2022-06-25T05:38:39Z) - Fast Model Editing at Scale [77.69220974621425]
We propose Model Editor Networks with Gradient Decomposition (MEND)
MEND is a collection of small auxiliary editing networks that use a single desired input-output pair to make fast, local edits to a pre-trained model.
MEND can be trained on a single GPU in less than a day even for 10 billion+ parameter models.
arXiv Detail & Related papers (2021-10-21T17:41:56Z) - Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution.
This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes.
Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z) - Diet deep generative audio models with structured lottery [2.348805691644086]
We study the lottery ticket hypothesis on deep generative audio models.
We show that we can remove up to 95% of the model weights without significant degradation in accuracy.
We discuss the possibility of implementing deep generative audio models on embedded platforms.
arXiv Detail & Related papers (2020-07-31T16:43:10Z) - Compression of descriptor models for mobile applications [26.498907514590165]
We evaluate the computational cost, model size, and matching accuracy tradeoffs for deep neural networks.
We observe a significant redundancy in the learned weights, which we exploit through the use of depthwise separable layers.
We propose the Convolution-Depthwise-Pointwise(CDP) layer, which provides a means of interpolating between the standard and depthwise separable convolutions.
arXiv Detail & Related papers (2020-01-09T17:00:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.