arfpy: A python package for density estimation and generative modeling
with adversarial random forests
- URL: http://arxiv.org/abs/2311.07366v1
- Date: Mon, 13 Nov 2023 14:28:21 GMT
- Title: arfpy: A python package for density estimation and generative modeling
with adversarial random forests
- Authors: Kristin Blesch, Marvin N. Wright
- Abstract summary: This paper introduces $textitarfpy$, a python implementation of Adversarial Random Forests (ARF) (Watson et al., 2023)
It is a lightweight procedure for synthesizing new data that resembles some given data.
- Score: 1.3597551064547502
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces $\textit{arfpy}$, a python implementation of
Adversarial Random Forests (ARF) (Watson et al., 2023), which is a lightweight
procedure for synthesizing new data that resembles some given data. The
software $\textit{arfpy}$ equips practitioners with straightforward
functionalities for both density estimation and generative modeling. The method
is particularly useful for tabular data and its competitive performance is
demonstrated in previous literature. As a major advantage over the mostly deep
learning based alternatives, $\textit{arfpy}$ combines the method's reduced
requirements in tuning efforts and computational resources with a user-friendly
python interface. This supplies audiences across scientific fields with
software to generate data effortlessly.
Related papers
- FABind: Fast and Accurate Protein-Ligand Binding [127.7790493202716]
$mathbfFABind$ is an end-to-end model that combines pocket prediction and docking to achieve accurate and fast protein-ligand binding.
Our proposed model demonstrates strong advantages in terms of effectiveness and efficiency compared to existing methods.
arXiv Detail & Related papers (2023-10-10T16:39:47Z) - SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot
Neural Sparse Retrieval [92.27387459751309]
We provide SPRINT, a unified Python toolkit for evaluating neural sparse retrieval.
We establish strong and reproducible zero-shot sparse retrieval baselines across the well-acknowledged benchmark, BEIR.
We show that SPLADEv2 produces sparse representations with a majority of tokens outside of the original query and document.
arXiv Detail & Related papers (2023-07-19T22:48:02Z) - SurvLIMEpy: A Python package implementing SurvLIME [1.0689187493307983]
We present SurvLIMEpy, an open-source Python package that implements the SurvLIME algorithm.
The package supports a wide variety of survival models, from the Cox Proportional Hazards Model to deep learning models such as DeepHit or DeepSurv.
arXiv Detail & Related papers (2023-02-21T09:54:32Z) - Minimalist Data Wrangling with Python [4.429175633425273]
Data Wrangling with Python is envisaged as a student's first introduction to data science.
It provides a high-level overview as well as discussing key concepts in detail.
arXiv Detail & Related papers (2022-11-09T01:24:39Z) - Smooth densities and generative modeling with unsupervised random
forests [1.433758865948252]
An important application for density estimators is synthetic data generation.
We propose a new method based on unsupervised random forests for estimating smooth densities in arbitrary dimensions without parametric constraints.
We prove the consistency of our approach and demonstrate its advantages over existing tree-based density estimators.
arXiv Detail & Related papers (2022-05-19T09:50:25Z) - DADApy: Distance-based Analysis of DAta-manifolds in Python [51.37841707191944]
DADApy is a python software package for analysing and characterising high-dimensional data.
It provides methods for estimating the intrinsic dimension and the probability density, for performing density-based clustering and for comparing different distance metrics.
arXiv Detail & Related papers (2022-05-04T08:41:59Z) - Scikit-dimension: a Python package for intrinsic dimension estimation [58.8599521537]
This technical note introduces textttscikit-dimension, an open-source Python package for intrinsic dimension estimation.
textttscikit-dimension package provides a uniform implementation of most of the known ID estimators based on scikit-learn application programming interface.
We briefly describe the package and demonstrate its use in a large-scale (more than 500 datasets) benchmarking of methods for ID estimation in real-life and synthetic data.
arXiv Detail & Related papers (2021-09-06T16:46:38Z) - Program Synthesis with Large Language Models [40.41120807053989]
We evaluate large language models for program synthesis in Python.
We find that synthesis performance scales log-linearly with model size.
We find that even our best models are generally unable to predict the output of a program given a specific input.
arXiv Detail & Related papers (2021-08-16T03:57:30Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - OPFython: A Python-Inspired Optimum-Path Forest Classifier [68.8204255655161]
This paper proposes a Python-based Optimum-Path Forest framework, denoted as OPFython.
As OPFython is a Python-based library, it provides a more friendly environment and a faster prototyping workspace than the C language.
arXiv Detail & Related papers (2020-01-28T15:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.