FlexModel: A Framework for Interpretability of Distributed Large
Language Models
- URL: http://arxiv.org/abs/2312.03140v1
- Date: Tue, 5 Dec 2023 21:19:33 GMT
- Title: FlexModel: A Framework for Interpretability of Distributed Large
Language Models
- Authors: Matthew Choi, Muhammad Adil Asif, John Willes and David Emerson
- Abstract summary: We present FlexModel, a software package providing a streamlined interface for engaging with models distributed across multi- GPU and multi-node configurations.
The library is compatible with existing model distribution libraries and encapsulates PyTorch models.
It exposes user-registerable HookFunctions to facilitate straightforward interaction with distributed model internals.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the growth of large language models, now incorporating billions of
parameters, the hardware prerequisites for their training and deployment have
seen a corresponding increase. Although existing tools facilitate model
parallelization and distributed training, deeper model interactions, crucial
for interpretability and responsible AI techniques, still demand thorough
knowledge of distributed computing. This often hinders contributions from
researchers with machine learning expertise but limited distributed computing
background. Addressing this challenge, we present FlexModel, a software package
providing a streamlined interface for engaging with models distributed across
multi-GPU and multi-node configurations. The library is compatible with
existing model distribution libraries and encapsulates PyTorch models. It
exposes user-registerable HookFunctions to facilitate straightforward
interaction with distributed model internals, bridging the gap between
distributed and single-device model paradigms. Primarily, FlexModel enhances
accessibility by democratizing model interactions and promotes more inclusive
research in the domain of large-scale neural networks. The package is found at
https://github.com/VectorInstitute/flex_model.
Related papers
- Knowledge Fusion By Evolving Weights of Language Models [5.354527640064584]
This paper examines the approach of integrating multiple models into a unified model.
We propose a knowledge fusion method named Evolver, inspired by evolutionary algorithms.
arXiv Detail & Related papers (2024-06-18T02:12:34Z) - Model Callers for Transforming Predictive and Generative AI Applications [2.7195102129095003]
We introduce a novel software abstraction termed "model caller"
Model callers act as an intermediary for AI and ML model calling.
We have released a prototype Python library for model callers, accessible for installation via pip or for download from GitHub.
arXiv Detail & Related papers (2024-04-17T12:21:06Z) - FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models [56.71672127740099]
We focus on the task of image segmentation, which is traditionally solved by training models on closed-vocabulary datasets.
We leverage different and relatively small-sized, open-source foundation models for zero-shot open-vocabulary segmentation.
Our approach (dubbed FreeSeg-Diff), which does not rely on any training, outperforms many training-based approaches on both Pascal VOC and COCO datasets.
arXiv Detail & Related papers (2024-03-29T10:38:25Z) - Arcee's MergeKit: A Toolkit for Merging Large Language Models [0.6374098147778188]
MergeKit is a framework to efficiently merge models on any hardware.
To date, thousands of models have been merged by the open-source community.
arXiv Detail & Related papers (2024-03-20T02:38:01Z) - AlpaServe: Statistical Multiplexing with Model Parallelism for Deep
Learning Serving [53.01646445659089]
We show that model parallelism can be used for the statistical multiplexing of multiple devices when serving multiple models.
We present a novel serving system, AlpaServe, that determines an efficient strategy for placing and parallelizing collections of large deep learning models.
arXiv Detail & Related papers (2023-02-22T21:41:34Z) - SWARM Parallelism: Training Large Models Can Be Surprisingly
Communication-Efficient [69.61083127540776]
Deep learning applications benefit from using large models with billions of parameters.
Training these models is notoriously expensive due to the need for specialized HPC clusters.
We consider alternative setups for training large models: using cheap "preemptible" instances or pooling existing resources from multiple regions.
arXiv Detail & Related papers (2023-01-27T18:55:19Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Amazon SageMaker Model Parallelism: A General and Flexible Framework for
Large Model Training [10.223511922625065]
We present Amazon SageMaker model parallelism, a software library that integrates with PyTorch.
It enables easy training of large models using model parallelism and other memory-saving features.
We evaluate performance over GPT-3, RoBERTa, BERT, and neural collaborative filtering.
arXiv Detail & Related papers (2021-11-10T22:30:21Z) - Learning Discrete Energy-based Models via Auxiliary-variable Local
Exploration [130.89746032163106]
We propose ALOE, a new algorithm for learning conditional and unconditional EBMs for discrete structured data.
We show that the energy function and sampler can be trained efficiently via a new variational form of power iteration.
We present an energy model guided fuzzer for software testing that achieves comparable performance to well engineered fuzzing engines like libfuzzer.
arXiv Detail & Related papers (2020-11-10T19:31:29Z) - Deep Generative Models that Solve PDEs: Distributed Computing for
Training Large Data-Free Models [25.33147292369218]
Recent progress in scientific machine learning (SciML) has opened up the possibility of training novel neural network architectures that solve complex partial differential equations (PDEs)
Here we report on a software framework for data parallel distributed deep learning that resolves the twin challenges of training these large SciML models.
Our framework provides several out of the box functionality including (a) loss integrity independent of number of processes, (b) synchronized batch normalization, and (c) distributed higher-order optimization methods.
arXiv Detail & Related papers (2020-07-24T22:42:35Z) - Ensemble Distillation for Robust Model Fusion in Federated Learning [72.61259487233214]
Federated Learning (FL) is a machine learning setting where many devices collaboratively train a machine learning model.
In most of the current training schemes the central model is refined by averaging the parameters of the server model and the updated parameters from the client side.
We propose ensemble distillation for model fusion, i.e. training the central classifier through unlabeled data on the outputs of the models from the clients.
arXiv Detail & Related papers (2020-06-12T14:49:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.