Equity through Access: A Case for Small-scale Deep Learning
- URL: http://arxiv.org/abs/2403.12562v1
- Date: Tue, 19 Mar 2024 09:17:18 GMT
- Title: Equity through Access: A Case for Small-scale Deep Learning
- Authors: Raghavendra Selvan, Bob Pepin, Christian Igel, Gabrielle Samuel, Erik B Dam,
- Abstract summary: Large-scale resources have been used to train progressively larger models which are resource intensive in terms of compute, data, energy, and carbon emissions.
These costs are becoming a new type of entry barrier to researchers and practitioners with limited access to resources at such scale.
In this work, we take a comprehensive look at the landscape of existing DL models for vision tasks and demonstrate their usefulness in settings where resources are limited.
- Score: 6.474195280156721
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The recent advances in deep learning (DL) have been accelerated by access to large-scale data and compute. These large-scale resources have been used to train progressively larger models which are resource intensive in terms of compute, data, energy, and carbon emissions. These costs are becoming a new type of entry barrier to researchers and practitioners with limited access to resources at such scale, particularly in the Global South. In this work, we take a comprehensive look at the landscape of existing DL models for vision tasks and demonstrate their usefulness in settings where resources are limited. To account for the resource consumption of DL models, we introduce a novel measure to estimate the performance per resource unit, which we call the PePR score. Using a diverse family of 131 unique DL architectures (spanning 1M to 130M trainable parameters) and three medical image datasets, we capture trends about the performance-resource trade-offs. In applications like medical image analysis, we argue that small-scale, specialized models are better than striving for large-scale models. Furthermore, we show that using pretrained models can significantly reduce the computational resources and data required. We hope this work will encourage the community to focus on improving AI equity by developing methods and models with smaller resource footprints.
Related papers
- What is the Role of Small Models in the LLM Era: A Survey [13.195074492564332]
Large Language Models (LLMs) have made significant progress in advancing artificial general intelligence (AGI), leading to the development of increasingly large models such as GPT-4 and LLaMA-405B.
scaling up model sizes results in exponentially higher computational costs and energy consumption, making these models impractical for academic researchers and businesses with limited resources.
At the same time, Small Models (SMs) are frequently used in practical settings, although their significance is currently underestimated.
arXiv Detail & Related papers (2024-09-10T20:45:43Z) - ssProp: Energy-Efficient Training for Convolutional Neural Networks with Scheduled Sparse Back Propagation [4.77407121905745]
Back-propagation (BP) is a major source of computational expense during training deep learning models.
We propose a general, energy-efficient convolution module that can be seamlessly integrated into any deep learning architecture.
arXiv Detail & Related papers (2024-08-22T17:22:59Z) - Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches [64.42735183056062]
Large language models (LLMs) have transitioned from specialized models to versatile foundation models.
LLMs exhibit impressive zero-shot ability, however, require fine-tuning on local datasets and significant resources for deployment.
arXiv Detail & Related papers (2024-08-20T09:42:17Z) - Dynamic Self-adaptive Multiscale Distillation from Pre-trained Multimodal Large Model for Efficient Cross-modal Representation Learning [12.00246872965739]
We propose a novel dynamic self-adaptive multiscale distillation from pre-trained multimodal large model.
Our strategy employs a multiscale perspective, enabling the extraction structural knowledge across from the pre-trained multimodal large model.
Our methodology streamlines pre-trained multimodal large models using only their output features and original image-level information.
arXiv Detail & Related papers (2024-04-16T18:22:49Z) - MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies [85.57899012821211]
Small Language Models (SLMs) are a resource-efficient alternative to Large Language Models (LLMs)
We introduce MiniCPM, specifically the 1.2B and 2.4B non-embedding parameter variants.
We also introduce MiniCPM family, including MiniCPM-DPO, MiniCPM-MoE and MiniCPM-128K.
arXiv Detail & Related papers (2024-04-09T15:36:50Z) - A Simple and Efficient Baseline for Data Attribution on Images [107.12337511216228]
Current state-of-the-art approaches require a large ensemble of as many as 300,000 models to accurately attribute model predictions.
In this work, we focus on a minimalist baseline, utilizing the feature space of a backbone pretrained via self-supervised learning to perform data attribution.
Our method is model-agnostic and scales easily to large datasets.
arXiv Detail & Related papers (2023-11-03T17:29:46Z) - Reusing Pretrained Models by Multi-linear Operators for Efficient
Training [65.64075958382034]
Training large models from scratch usually costs a substantial amount of resources.
Recent studies such as bert2BERT and LiGO have reused small pretrained models to initialize a large model.
We propose a method that linearly correlates each weight of the target model to all the weights of the pretrained model.
arXiv Detail & Related papers (2023-10-16T06:16:47Z) - Efficient Sentiment Analysis: A Resource-Aware Evaluation of Feature Extraction Techniques, Ensembling, and Deep Learning Models [0.0]
We evaluate document-level sentiment analysis models with a focus on resource costs.
We find that for smaller datasets, the differences in accuracy shrink while the difference in resource consumption grows further.
arXiv Detail & Related papers (2023-08-03T20:29:27Z) - Operating critical machine learning models in resource constrained
regimes [0.18416014644193066]
We investigate the trade-off between resource consumption and performance of deep learning models.
Deep learning models are used in critical settings such as in clinics.
arXiv Detail & Related papers (2023-03-17T12:02:08Z) - Scaling Vision-Language Models with Sparse Mixture of Experts [128.0882767889029]
We show that mixture-of-experts (MoE) techniques can achieve state-of-the-art performance on a range of benchmarks over dense models of equivalent computational cost.
Our research offers valuable insights into stabilizing the training of MoE models, understanding the impact of MoE on model interpretability, and balancing the trade-offs between compute performance when scaling vision-language models.
arXiv Detail & Related papers (2023-03-13T16:00:31Z) - Top-KAST: Top-K Always Sparse Training [50.05611544535801]
We propose Top-KAST, a method that preserves constant sparsity throughout training.
We show that it performs comparably to or better than previous works when training models on the established ImageNet benchmark.
In addition to our ImageNet results, we also demonstrate our approach in the domain of language modeling.
arXiv Detail & Related papers (2021-06-07T11:13:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.