Related papers: Dynamic Switch Layers For Unsupervised Learning

Dynamic Switch Layers For Unsupervised Learning

URL: http://arxiv.org/abs/2404.04405v1
Date: Fri, 5 Apr 2024 21:03:11 GMT
Title: Dynamic Switch Layers For Unsupervised Learning
Authors: Haiguang Li, Usama Pervaiz, Michał Matuszak, Robert Kamara, Gilles Roux, Trausti Thormundsson, Joseph Antognini,
Abstract summary: On-device machine learning (ODML) enables intelligent applications on resource-constrained devices. Power consumption poses a major challenge, forcing a trade-off between model accuracy and power efficiency. We introduce the Dynamic Switch Layer ( DSL) to extend the benefits of GC layers to unsupervised learning scenarios.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: On-device machine learning (ODML) enables intelligent applications on resource-constrained devices. However, power consumption poses a major challenge, forcing a trade-off between model accuracy and power efficiency that often limits model complexity. The previously established Gated Compression (GC) layers offer a solution, enabling power efficiency without sacrificing model performance by selectively gating samples that lack signals of interest. However, their reliance on ground truth labels limits GC layers to supervised tasks. This work introduces the Dynamic Switch Layer (DSL), extending the benefits of GC layers to unsupervised learning scenarios, and maintaining power efficiency without the need for labeled data. The DSL builds upon the GC architecture, leveraging a dynamic pathway selection, and adapting model complexity in response to the innate structure of the data. We integrate the DSL into the SoundStream architecture and demonstrate that by routing up to 80% of samples through a lightweight pass we achieve a 12.3x reduction in the amount of computation performed and a 20.9x reduction in model size. This reduces the on-device inference latency by up to 26.5% and improves power efficiency by up to 21.4% without impacting model performance.

Related papers

Accelerating Linear Recurrent Neural Networks for the Edge with Unstructured Sparsity [39.483346492111515]
Linear recurrent neural networks enable powerful long-range sequence modeling with constant memory usage and time-per-token during inference. Unstructured sparsity offers a compelling solution, enabling substantial reductions in compute and memory requirements when accelerated by compatible hardware platforms. We find that highly sparse linear RNNs consistently achieve better efficiency-performance trade-offs than dense baselines.
arXiv Detail & Related papers (2025-02-03T13:09:21Z)
Dynamic layer selection in decoder-only transformers [21.18795712840146]
We empirically examine two common dynamic inference methods for natural language generation. We find that a pre-trained decoder-only model is significantly more robust to layer removal via layer skipping. We also show that dynamic computation allocation on a per-sequence basis holds promise for significant efficiency gains.
arXiv Detail & Related papers (2024-10-26T00:44:11Z)
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
Compressing Large Language Models with Automated Sub-Network Search [41.452512557226335]
We consider model compression for Large Language Models to reduce model size while improving downstream task performance. We phrase this as a neural architecture search problem that automatically prunes structural components. Our method achieves upto 9.85% improvement on average on 11 diverse downstream tasks, while achieving up to 22% improvement of on-device latency.
arXiv Detail & Related papers (2024-10-09T02:14:39Z)
A-SDM: Accelerating Stable Diffusion through Model Assembly and Feature Inheritance Strategies [51.7643024367548]
Stable Diffusion Model is a prevalent and effective model for text-to-image (T2I) and image-to-image (I2I) generation. This study focuses on reducing redundant computation in SDM and optimizing the model through both tuning and tuning-free methods.
arXiv Detail & Related papers (2024-05-31T21:47:05Z)
Enhancing User Experience in On-Device Machine Learning with Gated Compression Layers [0.0]
On-device machine learning (ODML) enables powerful edge applications, but power consumption remains a key challenge for resource-constrained devices. This work focuses on the use of Gated Compression (GC) layer to enhance ODML model performance while conserving power. GC layers dynamically regulate data flow by selectively gating activations of neurons within the neural network and effectively filtering out non-essential inputs.
arXiv Detail & Related papers (2024-05-02T21:18:06Z)
LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights [2.8461446020965435]
We introduce LD-Pruner, a novel performance-preserving structured pruning method for compressing Latent Diffusion Models. We demonstrate the effectiveness of our approach on three different tasks: text-to-image (T2I) generation, Unconditional Image Generation (UIG) and Unconditional Audio Generation (UAG)
arXiv Detail & Related papers (2024-04-18T06:35:37Z)
Lightweight Diffusion Models with Distillation-Based Block Neural Architecture Search [55.41583104734349]
We propose to automatically remove structural redundancy in diffusion models with our proposed Diffusion Distillation-based Block-wise Neural Architecture Search (NAS) Given a larger pretrained teacher, we leverage DiffNAS to search for the smallest architecture which can achieve on-par or even better performance than the teacher. Different from previous block-wise NAS methods, DiffNAS contains a block-wise local search strategy and a retraining strategy with a joint dynamic loss.
arXiv Detail & Related papers (2023-11-08T12:56:59Z)
Federated Learning for Energy-limited Wireless Networks: A Partial Model Aggregation Approach [79.59560136273917]
limited communication resources, bandwidth and energy, and data heterogeneity across devices are main bottlenecks for federated learning (FL) We first devise a novel FL framework with partial model aggregation (PMA) The proposed PMA-FL improves 2.72% and 11.6% accuracy on two typical heterogeneous datasets.
arXiv Detail & Related papers (2022-04-20T19:09:52Z)
Learning to Generate Content-Aware Dynamic Detectors [62.74209921174237]
We introduce a newpective of designing efficient detectors, which is automatically generating sample-adaptive model architecture. We introduce a course-to-fine strat-egy tailored for object detection to guide the learning of dynamic routing. Experiments on MS-COCO dataset demonstrate that CADDet achieves 1.8 higher mAP with 10% fewer FLOPs compared with vanilla routing.
arXiv Detail & Related papers (2020-12-08T08:05:20Z)
Fully Dynamic Inference with Deep Neural Networks [19.833242253397206]
Two compact networks, called Layer-Net (L-Net) and Channel-Net (C-Net), predict on a per-instance basis which layers or filters/channels are redundant and therefore should be skipped. On the CIFAR-10 dataset, LC-Net results in up to 11.9$times$ fewer floating-point operations (FLOPs) and up to 3.3% higher accuracy compared to other dynamic inference methods. On the ImageNet dataset, LC-Net achieves up to 1.4$times$ fewer FLOPs and up to 4.6% higher Top-1 accuracy than the other methods.
arXiv Detail & Related papers (2020-07-29T23:17:48Z)
Highly Efficient Salient Object Detection with 100K Parameters [137.74898755102387]
We propose a flexible convolutional module, namely generalized OctConv (gOctConv), to efficiently utilize both in-stage and cross-stages multi-scale features. We build an extremely light-weighted model, namely CSNet, which achieves comparable performance with about 0.2% (100k) of large models on popular object detection benchmarks.
arXiv Detail & Related papers (2020-03-12T07:00:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.