Related papers: An Empirical Study of Library Usage and Dependency in Deep Learning Frameworks

An Empirical Study of Library Usage and Dependency in Deep Learning Frameworks

URL: http://arxiv.org/abs/2211.15733v1
Date: Mon, 28 Nov 2022 19:31:56 GMT
Title: An Empirical Study of Library Usage and Dependency in Deep Learning Frameworks
Authors: Mohamed Raed El aoun, Lionel Nganyewou Tidjon, Ben Rombaut, Foutse Khomh, Ahmed E. Hassan
Abstract summary: pytorch, Caffe, and Scikit-learn are the most frequent combination in 18% and 14% of the projects. The developer uses two or three dl libraries in the same projects and tends to use different multiple dl libraries in both the same function and the same files.
Score: 12.624032509149869
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in deep learning (dl) have led to the release of several dl software libraries such as pytorch, Caffe, and TensorFlow, in order to assist machine learning (ml) practitioners in developing and deploying state-of-the-art deep neural networks (DNN), but they are not able to properly cope with limitations in the dl libraries such as testing or data processing. In this paper, we present a qualitative and quantitative analysis of the most frequent dl libraries combination, the distribution of dl library dependencies across the ml workflow, and formulate a set of recommendations to (i) hardware builders for more optimized accelerators and (ii) library builder for more refined future releases. Our study is based on 1,484 open-source dl projects with 46,110 contributors selected based on their reputation. First, we found an increasing trend in the usage of deep learning libraries. Second, we highlight several usage patterns of deep learning libraries. In addition, we identify dependencies between dl libraries and the most frequent combination where we discover that pytorch and Scikit-learn and, Keras and TensorFlow are the most frequent combination in 18% and 14% of the projects. The developer uses two or three dl libraries in the same projects and tends to use different multiple dl libraries in both the same function and the same files. The developer shows patterns in using various deep-learning libraries and prefers simple functions with fewer arguments and straightforward goals. Finally, we present the implications of our findings for researchers, library maintainers, and hardware vendors.

Related papers

LibEvolutionEval: A Benchmark and Study for Version-Specific Code Generation [40.87656746406113]
We introduce LibEvolutionEval, a study requiring an understanding of library evolution to perform in-line code completion accurately. We evaluate popular public models and find that public library evolution significantly influences model performance. We explore mitigation methods by studying how retrieved version-specific library documentation and prompting can improve the model's capability in handling fast-evolving packages.
arXiv Detail & Related papers (2024-11-19T21:52:23Z)
Library Learning Doesn't: The Curious Case of the Single-Use "Library" [20.25809428140996]
We study two library learning systems for mathematics which both reported increased accuracy: LEGO-Prover and TroVE. We find that function reuse is extremely infrequent on miniF2F and MATH. Our followup experiments suggest that, rather than reuse, self-correction and self-consistency are the primary drivers of the observed performance gains.
arXiv Detail & Related papers (2024-10-26T21:05:08Z)
DLLens: Testing Deep Learning Libraries via LLM-aided Synthesis [8.779035160734523]
Testing is a major approach to ensuring the quality of deep learning (DL) libraries. Existing testing techniques commonly adopt differential testing to relieve the need for test oracle construction. This paper introduces thatens, a novel differential testing technique for DL library testing.
arXiv Detail & Related papers (2024-06-12T07:06:38Z)
EduNLP: Towards a Unified and Modularized Library for Educational Resources [78.8523961816045]
We present a unified, modularized, and extensive library, EduNLP, focusing on educational resource understanding. In the library, we decouple the whole workflow to four key modules with consistent interfaces including data configuration, processing, model implementation, and model evaluation. For the current version, we primarily provide 10 typical models from four categories, and 5 common downstream-evaluation tasks in the education domain on 8 subjects for users' usage.
arXiv Detail & Related papers (2024-06-03T12:45:40Z)
torchgfn: A PyTorch GFlowNet library [56.071033896777784]
torchgfn is a PyTorch library that aims to address this need. It provides users with a simple API for environments and useful abstractions for samplers and losses.
arXiv Detail & Related papers (2023-05-24T00:20:59Z)
SequeL: A Continual Learning Library in PyTorch and JAX [50.33956216274694]
SequeL is a library for Continual Learning that supports both PyTorch and JAX frameworks. It provides a unified interface for a wide range of Continual Learning algorithms, including regularization-based approaches, replay-based approaches, and hybrid approaches. We release SequeL as an open-source library, enabling researchers and developers to easily experiment and extend the library for their own purposes.
arXiv Detail & Related papers (2023-04-21T10:00:22Z)
Code Librarian: A Software Package Recommendation System [65.05559087332347]
We present a recommendation engine called Librarian for open source libraries. A candidate library package is recommended for a given context if: 1) it has been frequently used with the imported libraries in the program; 2) it has similar functionality to the imported libraries in the program; 3) it has similar functionality to the developer's implementation, and 4) it can be used efficiently in the context of the provided code.
arXiv Detail & Related papers (2022-10-11T12:30:05Z)
Do Not Take It for Granted: Comparing Open-Source Libraries for Software Development Effort Estimation [9.224578642189023]
This paper aims at raising awareness of the differences incurred when using different Machine Learning (ML) libraries for software development effort estimation (SEE) We investigate 4 deterministic machine learners as provided by 3 of the most popular ML open-source libraries written in different languages (namely, Scikit-Learn, Caret and Weka) The results of our study reveal that the predictions provided by the 3 libraries differ in 95% of the cases on average across a total of 105 cases studied.
arXiv Detail & Related papers (2022-07-04T20:06:40Z)
Benchmark Assessment for DeepSpeed Optimization Library [1.7839986996686321]
Deep Learning (DL) models are widely used in machine learning due to their performance and ability to deal with large datasets. The size of such datasets and the complexity of DL models cause such models to be complex, consuming large amount of resources and time to train. Many recent libraries and applications are introduced to deal with DL complexity and efficiency issues.
arXiv Detail & Related papers (2022-02-12T04:52:28Z)
LibFewShot: A Comprehensive Library for Few-shot Learning [78.58842209282724]
Few-shot learning, especially few-shot image classification, has received increasing attention and witnessed significant advances in recent years. Some recent studies implicitly show that many generic techniques or tricks, such as data augmentation, pre-training, knowledge distillation, and self-supervision, may greatly boost the performance of a few-shot learning method. We propose a comprehensive library for few-shot learning (LibFewShot) by re-implementing seventeen state-of-the-art few-shot learning methods in a unified framework with the same single intrinsic in PyTorch.
arXiv Detail & Related papers (2021-09-10T14:12:37Z)
Solo-learn: A Library of Self-supervised Methods for Visual Representation Learning [83.02597612195966]
solo-learn is a library of self-supervised methods for visual representation learning. Implemented in Python, using Pytorch and Pytorch lightning, the library fits both research and industry needs.
arXiv Detail & Related papers (2021-08-03T22:19:55Z)
Req2Lib: A Semantic Neural Model for Software Library Recommendation [8.713783358744166]
We propose a novel neural approach called Req2Lib which recommends libraries given descriptions of the project requirement. We use a Sequence-to-Sequence model to learn the library linked-usage information and semantic information of requirement descriptions in natural language. Our preliminary evaluation demonstrates that Req2Lib can recommend libraries accurately.
arXiv Detail & Related papers (2020-05-24T14:37:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.