Related papers: ManyTypes4Py: A Benchmark Python Dataset for Machine Learning-based Type Inference

ManyTypes4Py: A Benchmark Python Dataset for Machine Learning-based Type Inference

URL: http://arxiv.org/abs/2104.04706v1
Date: Sat, 10 Apr 2021 08:10:06 GMT
Title: ManyTypes4Py: A Benchmark Python Dataset for Machine Learning-based Type Inference
Authors: Amir M. Mir, Evaldas Latoskinas, Georgios Gousios
Abstract summary: ManyTypes4Py is a large Python dataset for machine learning (ML)-based type inference. The dataset contains a total of 5,382 Python projects with more than 869K type annotations.
Score: 9.384801062680786
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: In this paper, we present ManyTypes4Py, a large Python dataset for machine learning (ML)-based type inference. The dataset contains a total of 5,382 Python projects with more than 869K type annotations. Duplicate source code files were removed to eliminate the negative effect of the duplication bias. To facilitate training and evaluation of ML models, the dataset was split into training, validation and test sets by files. To extract type information from abstract syntax trees (ASTs), a lightweight static analyzer pipeline is developed and accompanied with the dataset. Using this pipeline, the collected Python projects were analyzed and the results of the AST analysis were stored in JSON-formatted files. The ManyTypes4Py dataset is shared on zenodo and its tools are publicly available on GitHub.

Related papers

Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models. Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models. Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z)
CodeInsight: A Curated Dataset of Practical Coding Solutions from Stack Overflow [10.19019476978683]
dataset provides examples that include a clarified intent, code snippets associated, and an average of three related unit tests. Comprising 3,409 crafted examples by Python experts, our dataset is designed for both model finetuning and standalone evaluation.
arXiv Detail & Related papers (2024-09-25T11:18:52Z)
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content [13.187520657952263]
Large Language Models (LLMs) are trained on vast amounts of data, most of which is automatically scraped from the internet. evaluating models on test splits that might have leaked into the training set is prone to misleading conclusions. We introduce a new test dataset named RepLiQA, suited for question-answering and topic retrieval tasks.
arXiv Detail & Related papers (2024-06-17T17:52:54Z)
LexMatcher: Dictionary-centric Data Collection for LLM-based Machine Translation [67.24113079928668]
We present LexMatcher, a method for data curation driven by the coverage of senses found in bilingual dictionaries. Our approach outperforms the established baselines on the WMT2022 test sets.
arXiv Detail & Related papers (2024-06-03T15:30:36Z)
SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval [92.27387459751309]
We provide SPRINT, a unified Python toolkit for evaluating neural sparse retrieval. We establish strong and reproducible zero-shot sparse retrieval baselines across the well-acknowledged benchmark, BEIR. We show that SPLADEv2 produces sparse representations with a majority of tokens outside of the original query and document.
arXiv Detail & Related papers (2023-07-19T22:48:02Z)
AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators [98.11286353828525]
GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks. We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate. We build the first conversation-based information retrieval dataset employing AnnoLLM.
arXiv Detail & Related papers (2023-03-29T17:03:21Z)
InPars-v2: Large Language Models as Efficient Dataset Generators for Information Retrieval [4.888022358881737]
We introduce InPars-v2, a dataset generator that uses open-source LLMs and powerful rerankers to select synthetic query-document pairs for training. A simple BM25 retrieval pipeline followed by a monoT5 reranker finetuned on InPars-v2 data achieves new state-of-the-art results on the BEIR benchmark.
arXiv Detail & Related papers (2023-01-04T20:58:43Z)
Evaluating the Impact of Source Code Parsers on ML4SE Models [3.699097874146491]
We evaluate two models, namely, Supernorm2Seq and TreeLSTM, in the name prediction language. We show that trees built by differents vary in their structure and content. We then analyze how this diversity affects the models' quality.
arXiv Detail & Related papers (2022-06-17T12:10:04Z)
Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$ [118.04625413322827]
$texttt5x$ and $texttseqio$ are open source software libraries for building and training language models. These libraries have been used to train models with hundreds of billions of parameters on datasets with multiple terabytes of training data.
arXiv Detail & Related papers (2022-03-31T17:12:13Z)
Deepchecks: A Library for Testing and Validating Machine Learning Models and Data [8.876608553825227]
Deepchecks is a Python library for comprehensively validating machine learning models and data. Our goal is to provide an easy-to-use library comprising of many checks related to various types of issues.
arXiv Detail & Related papers (2022-03-16T09:37:22Z)
Multi-layer Optimizations for End-to-End Data Analytics [71.05611866288196]
We introduce Iterative Functional Aggregate Queries (IFAQ), a framework that realizes an alternative approach. IFAQ treats the feature extraction query and the learning task as one program given in the IFAQ's domain-specific language. We show that a Scala implementation of IFAQ can outperform mlpack, Scikit, and specialization by several orders of magnitude for linear regression and regression tree models over several relational datasets.
arXiv Detail & Related papers (2020-01-10T16:14:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.