PyRelationAL: A Library for Active Learning Research and Development
- URL: http://arxiv.org/abs/2205.11117v1
- Date: Mon, 23 May 2022 08:21:21 GMT
- Title: PyRelationAL: A Library for Active Learning Research and Development
- Authors: Paul Scherer and Thomas Gaudelet and Alison Pouplin and Suraj M S and
Jyothish Soman and Lindsay Edwards and Jake P. Taylor-King
- Abstract summary: PyRelationAL is an open source library for active learning (AL) research.
It provides access to benchmark datasets and AL task configurations based on existing literature.
We perform experiments on the PyRelationAL collection of benchmark datasets and showcase the considerable economies that AL can provide.
- Score: 0.11545092788508224
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In constrained real-world scenarios where it is challenging or costly to
generate data, disciplined methods for acquiring informative new data points
are of fundamental importance for the efficient training of machine learning
(ML) models. Active learning (AL) is a subfield of ML focused on the
development of methods to iteratively and economically acquire data through
strategically querying new data points that are the most useful for a
particular task. Here, we introduce PyRelationAL, an open source library for AL
research. We describe a modular toolkit that is compatible with diverse ML
frameworks (e.g. PyTorch, Scikit-Learn, TensorFlow, JAX). Furthermore, to help
accelerate research and development in the field, the library implements a
number of published methods and provides API access to wide-ranging benchmark
datasets and AL task configurations based on existing literature. The library
is supplemented by an expansive set of tutorials, demos, and documentation to
help users get started. We perform experiments on the PyRelationAL collection
of benchmark datasets and showcase the considerable economies that AL can
provide. PyRelationAL is maintained using modern software engineering practices
- with an inclusive contributor code of conduct - to promote long term library
quality and utilisation.
Related papers
- OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models [70.72097493954067]
Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning, tasks and agent systems.
We introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an open cookbook'' for the research community.
arXiv Detail & Related papers (2024-11-07T17:47:25Z) - $\texttt{dattri}$: A Library for Efficient Data Attribution [7.803566162554017]
Data attribution methods aim to quantify the influence of individual training samples on the prediction of artificial intelligence (AI) models.
Despite a surge of new data attribution methods being developed, there lacks a comprehensive library that facilitates the development, benchmarking, and deployment of different data attribution methods.
In this work, we introduce $textttdattri$, an open-source data attribution library that addresses the above needs.
arXiv Detail & Related papers (2024-10-06T17:18:09Z) - Deep Fast Machine Learning Utils: A Python Library for Streamlined Machine Learning Prototyping [0.0]
The Deep Fast Machine Learning Utils (DFMLU) library provides tools designed to automate and enhance aspects of machine learning processes.
DFMLU offers functionalities that support model development and data handling.
This manuscript presents an overview of DFMLU's functionalities, providing Python examples for each tool.
arXiv Detail & Related papers (2024-09-14T21:39:17Z) - DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning [93.96463520716759]
Large language model (LLM) agents have demonstrated impressive capabilities in utilizing external tools and knowledge to boost accuracy and hallucinations.
Here, we introduce AvaTaR, a novel and automated framework that optimize an LLM agent to effectively leverage provided tools, improving performance on a given task.
arXiv Detail & Related papers (2024-06-17T04:20:02Z) - API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs [28.840207102132286]
We focus on the task of identifying, curating, and transforming existing datasets.
We introduce API-BLEND, a large corpora for training and systematic testing of tool-augmented LLMs.
We demonstrate the utility of the API-BLEND dataset for both training and benchmarking purposes.
arXiv Detail & Related papers (2024-02-23T18:30:49Z) - DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows [72.40917624485822]
We introduce DataDreamer, an open source Python library that allows researchers to implement powerful large language models.
DataDreamer also helps researchers adhere to best practices that we propose to encourage open science.
arXiv Detail & Related papers (2024-02-16T00:10:26Z) - SequeL: A Continual Learning Library in PyTorch and JAX [50.33956216274694]
SequeL is a library for Continual Learning that supports both PyTorch and JAX frameworks.
It provides a unified interface for a wide range of Continual Learning algorithms, including regularization-based approaches, replay-based approaches, and hybrid approaches.
We release SequeL as an open-source library, enabling researchers and developers to easily experiment and extend the library for their own purposes.
arXiv Detail & Related papers (2023-04-21T10:00:22Z) - Datasets: A Community Library for Natural Language Processing [55.48866401721244]
datasets is a community library for contemporary NLP.
The library includes more than 650 unique datasets, has more than 250 contributors, and has helped support a variety of novel cross-dataset research projects.
arXiv Detail & Related papers (2021-09-07T03:59:22Z) - pyBKT: An Accessible Python Library of Bayesian Knowledge Tracing Models [0.0]
We introduce pyBKT, a library of model extensions for knowledge tracing.
The library provides data generation, fitting, prediction, and cross-validation routines.
pyBKT is open source and open license for the purpose of making knowledge tracing more accessible to communities of research and practice.
arXiv Detail & Related papers (2021-05-02T03:08:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.