PyRelationAL: A Library for Active Learning Research and Development
- URL: http://arxiv.org/abs/2205.11117v1
- Date: Mon, 23 May 2022 08:21:21 GMT
- Title: PyRelationAL: A Library for Active Learning Research and Development
- Authors: Paul Scherer and Thomas Gaudelet and Alison Pouplin and Suraj M S and
Jyothish Soman and Lindsay Edwards and Jake P. Taylor-King
- Abstract summary: PyRelationAL is an open source library for active learning (AL) research.
It provides access to benchmark datasets and AL task configurations based on existing literature.
We perform experiments on the PyRelationAL collection of benchmark datasets and showcase the considerable economies that AL can provide.
- Score: 0.11545092788508224
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In constrained real-world scenarios where it is challenging or costly to
generate data, disciplined methods for acquiring informative new data points
are of fundamental importance for the efficient training of machine learning
(ML) models. Active learning (AL) is a subfield of ML focused on the
development of methods to iteratively and economically acquire data through
strategically querying new data points that are the most useful for a
particular task. Here, we introduce PyRelationAL, an open source library for AL
research. We describe a modular toolkit that is compatible with diverse ML
frameworks (e.g. PyTorch, Scikit-Learn, TensorFlow, JAX). Furthermore, to help
accelerate research and development in the field, the library implements a
number of published methods and provides API access to wide-ranging benchmark
datasets and AL task configurations based on existing literature. The library
is supplemented by an expansive set of tutorials, demos, and documentation to
help users get started. We perform experiments on the PyRelationAL collection
of benchmark datasets and showcase the considerable economies that AL can
provide. PyRelationAL is maintained using modern software engineering practices
- with an inclusive contributor code of conduct - to promote long term library
quality and utilisation.
Related papers
- DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - AvaTaR: Optimizing LLM Agents for Tool-Assisted Knowledge Retrieval [93.96463520716759]
Large language model (LLM) agents have demonstrated impressive capability in utilizing external tools and knowledge to boost accuracy and reduce hallucinations.
Here, we introduce AvaTaR, a novel framework that optimize an LLM agent to effectively use the provided tools and improve its performance on a given task/domain.
We find AvaTaR consistently outperforms state-of-the-art approaches across all four challenging tasks and exhibits strong generalization ability when applied to novel cases.
arXiv Detail & Related papers (2024-06-17T04:20:02Z) - COLT: Towards Completeness-Oriented Tool Retrieval for Large Language Models [60.733557487886635]
We propose a novel modelagnostic COllaborative Learning-based Tool Retrieval approach, COLT.
COLT captures semantic similarities between user queries and tool descriptions.
It also takes into account the collaborative information of tools.
arXiv Detail & Related papers (2024-05-25T06:41:23Z) - API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs [28.840207102132286]
We focus on the task of identifying, curating, and transforming existing datasets.
We introduce API-BLEND, a large corpora for training and systematic testing of tool-augmented LLMs.
We demonstrate the utility of the API-BLEND dataset for both training and benchmarking purposes.
arXiv Detail & Related papers (2024-02-23T18:30:49Z) - DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows [72.40917624485822]
We introduce DataDreamer, an open source Python library that allows researchers to implement powerful large language models.
DataDreamer also helps researchers adhere to best practices that we propose to encourage open science.
arXiv Detail & Related papers (2024-02-16T00:10:26Z) - Julearn: an easy-to-use library for leakage-free evaluation and
inspection of ML models [0.23301643766310373]
We present the rationale behind julearn's design, its core features, and showcase three examples of previously-published research projects.
Julearn aims to simplify the entry into the machine learning world by providing an easy-to-use environment with built in guards against some of the most common ML pitfalls.
arXiv Detail & Related papers (2023-10-19T08:21:12Z) - Utilising a Large Language Model to Annotate Subject Metadata: A Case
Study in an Australian National Research Data Catalogue [18.325675189960833]
In support of open and reproducible research, there has been a rapidly increasing number of datasets made available for research.
As the availability of datasets increases, it becomes more important to have quality metadata for discovering and reusing them.
This paper proposes to leverage large language models (LLMs) for cost-effective annotation of subject metadata through the LLM-based in-context learning.
arXiv Detail & Related papers (2023-10-17T14:52:33Z) - SequeL: A Continual Learning Library in PyTorch and JAX [50.33956216274694]
SequeL is a library for Continual Learning that supports both PyTorch and JAX frameworks.
It provides a unified interface for a wide range of Continual Learning algorithms, including regularization-based approaches, replay-based approaches, and hybrid approaches.
We release SequeL as an open-source library, enabling researchers and developers to easily experiment and extend the library for their own purposes.
arXiv Detail & Related papers (2023-04-21T10:00:22Z) - Datasets: A Community Library for Natural Language Processing [55.48866401721244]
datasets is a community library for contemporary NLP.
The library includes more than 650 unique datasets, has more than 250 contributors, and has helped support a variety of novel cross-dataset research projects.
arXiv Detail & Related papers (2021-09-07T03:59:22Z) - pyBKT: An Accessible Python Library of Bayesian Knowledge Tracing Models [0.0]
We introduce pyBKT, a library of model extensions for knowledge tracing.
The library provides data generation, fitting, prediction, and cross-validation routines.
pyBKT is open source and open license for the purpose of making knowledge tracing more accessible to communities of research and practice.
arXiv Detail & Related papers (2021-05-02T03:08:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.