Git-Theta: A Git Extension for Collaborative Development of Machine
Learning Models
- URL: http://arxiv.org/abs/2306.04529v1
- Date: Wed, 7 Jun 2023 15:37:50 GMT
- Title: Git-Theta: A Git Extension for Collaborative Development of Machine
Learning Models
- Authors: Nikhil Kandpal, Brian Lester, Mohammed Muqeeth, Anisha Mascarenhas,
Monty Evans, Vishal Baskaran, Tenghao Huang, Haokun Liu, Colin Raffel
- Abstract summary: We introduce Git-Theta, a version control system for machine learning models.
Git-Theta is an extension to Git, the most widely used version control software.
- Score: 26.107117592578632
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Currently, most machine learning models are trained by centralized teams and
are rarely updated. In contrast, open-source software development involves the
iterative development of a shared artifact through distributed collaboration
using a version control system. In the interest of enabling collaborative and
continual improvement of machine learning models, we introduce Git-Theta, a
version control system for machine learning models. Git-Theta is an extension
to Git, the most widely used version control software, that allows fine-grained
tracking of changes to model parameters alongside code and other artifacts.
Unlike existing version control systems that treat a model checkpoint as a blob
of data, Git-Theta leverages the structure of checkpoints to support
communication-efficient updates, automatic model merges, and meaningful
reporting about the difference between two versions of a model. In addition,
Git-Theta includes a plug-in system that enables users to easily add support
for new functionality. In this paper, we introduce Git-Theta's design and
features and include an example use-case of Git-Theta where a pre-trained model
is continually adapted and modified. We publicly release Git-Theta in hopes of
kickstarting a new era of collaborative model development.
Related papers
- Visual Analysis of GitHub Issues to Gain Insights [2.9051263101214566]
This paper presents a prototype web application that generates visualizations to offer insights into issue timelines.
It focuses on the lifecycle of issues and depicts vital information to enhance users' understanding of development patterns.
arXiv Detail & Related papers (2024-07-30T15:17:57Z) - Automatic Categorization of GitHub Actions with Transformers and Few-shot Learning [12.254055731378045]
GitHub Actions (GHA) have been conceived to provide developers with a practical tool to create and maintain a pipeline.
To expose actions to search engines, GitHub allows developers to assign them to one or more categories manually.
We propose Gavel, a practical solution to increasing the visibility of actions in GitHub.
arXiv Detail & Related papers (2024-07-24T02:27:36Z) - VersiCode: Towards Version-controllable Code Generation [58.82709231906735]
Large Language Models (LLMs) have made tremendous strides in code generation, but existing research fails to account for the dynamic nature of software development.
We propose two novel tasks aimed at bridging this gap: version-specific code completion (VSCC) and version-aware code migration (VACM)
We conduct an extensive evaluation on VersiCode, which reveals that version-controllable code generation is indeed a significant challenge.
arXiv Detail & Related papers (2024-06-11T16:15:06Z) - GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension [81.44231422624055]
A growing area of research focuses on Large Language Models (LLMs) equipped with external tools capable of performing diverse tasks.
In this paper, we introduce GitAgent, an agent capable of achieving the autonomous tool extension from GitHub.
arXiv Detail & Related papers (2023-12-28T15:47:30Z) - SWE-bench: Can Language Models Resolve Real-World GitHub Issues? [80.52201658231895]
SWE-bench is an evaluation framework consisting of $2,294$ software engineering problems drawn from real GitHub issues and corresponding pull requests across $12$ popular Python repositories.
We show that both state-of-the-art proprietary models and our fine-tuned model SWE-Llama can resolve only the simplest issues.
arXiv Detail & Related papers (2023-10-10T16:47:29Z) - The GitHub Development Workflow Automation Ecosystems [47.818229204130596]
Large-scale software development has become a highly collaborative endeavour.
This chapter explores the ecosystems of development bots and GitHub Actions.
It provides an extensive survey of the state-of-the-art in this domain.
arXiv Detail & Related papers (2023-05-08T15:24:23Z) - SequeL: A Continual Learning Library in PyTorch and JAX [50.33956216274694]
SequeL is a library for Continual Learning that supports both PyTorch and JAX frameworks.
It provides a unified interface for a wide range of Continual Learning algorithms, including regularization-based approaches, replay-based approaches, and hybrid approaches.
We release SequeL as an open-source library, enabling researchers and developers to easily experiment and extend the library for their own purposes.
arXiv Detail & Related papers (2023-04-21T10:00:22Z) - Learning Backward Compatible Embeddings [74.74171220055766]
We study the problem of embedding version updates and their backward compatibility.
We develop a solution based on learning backward compatible embeddings.
We show that the best method, which we call BC-Aligner, maintains backward compatibility with existing unintended tasks even after multiple model version updates.
arXiv Detail & Related papers (2022-06-07T06:30:34Z) - FLHub: a Federated Learning model sharing service [0.7614628596146599]
We propose Federated Learning Hub (FLHub) as a sharing service for machine learning models.
FLHub allows users to upload, download, and contribute the model developed by other developers similarly to GitHub.
We demonstrate that a forked model can finish training faster than the existing model and that learning progressed more quickly for each federated round.
arXiv Detail & Related papers (2022-02-14T06:02:55Z) - GitEvolve: Predicting the Evolution of GitHub Repositories [31.814226661858694]
We propose GitEvolve, a system to predict the evolution of GitHub repositories.
We map users to groups by modelling common interests to better predict popularity.
The proposed multi-task architecture is generic and can be extended to model information diffusion in other social networks.
arXiv Detail & Related papers (2020-10-09T04:32:15Z) - Student Teamwork on Programming Projects: What can GitHub logs show us? [3.764846583322767]
We collected GitHub logs from two programming projects in two offerings of a CS2 Java programming course for computer science majors.
Students worked in pairs for both projects (one optional, the other mandatory) in each year.
We can identify the students' teamwork style automatically from their submission logs.
arXiv Detail & Related papers (2020-08-25T20:41:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.