Related papers: arfpy: A python package for density estimation and generative modeling with adversarial random forests

arfpy: A python package for density estimation and generative modeling with adversarial random forests

URL: http://arxiv.org/abs/2311.07366v1
Date: Mon, 13 Nov 2023 14:28:21 GMT
Title: arfpy: A python package for density estimation and generative modeling with adversarial random forests
Authors: Kristin Blesch, Marvin N. Wright
Abstract summary: This paper introduces $textitarfpy$, a python implementation of Adversarial Random Forests (ARF) (Watson et al., 2023) It is a lightweight procedure for synthesizing new data that resembles some given data.
Score: 1.3597551064547502
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper introduces $\textit{arfpy}$, a python implementation of Adversarial Random Forests (ARF) (Watson et al., 2023), which is a lightweight procedure for synthesizing new data that resembles some given data. The software $\textit{arfpy}$ equips practitioners with straightforward functionalities for both density estimation and generative modeling. The method is particularly useful for tabular data and its competitive performance is demonstrated in previous literature. As a major advantage over the mostly deep learning based alternatives, $\textit{arfpy}$ combines the method's reduced requirements in tuning efforts and computational resources with a user-friendly python interface. This supplies audiences across scientific fields with software to generate data effortlessly.

Related papers

DUPRE: Data Utility Prediction for Efficient Data Valuation [49.60564885180563]
Cooperative game theory-based data valuation, such as Data Shapley, requires evaluating the data utility and retraining the ML model for multiple data subsets. Our framework, textttDUPRE, takes an alternative yet complementary approach that reduces the cost per subset evaluation by predicting data utilities instead of evaluating them by model retraining. Specifically, given the evaluated data utilities of some data subsets, textttDUPRE fits a emphGaussian process (GP) regression model to predict the utility of every other data subset.
arXiv Detail & Related papers (2025-02-22T08:53:39Z)
eipy: An Open-Source Python Package for Multi-modal Data Integration using Heterogeneous Ensembles [2.957103424179249]
eipy is an open-source Python package for developing effective, multi-modal heterogeneous ensembles for classification.<n>eipy provides both a rigorous, and user-friendly framework for comparing and selecting the best-performing data integration and predictive modeling methods.
arXiv Detail & Related papers (2024-01-17T20:07:47Z)
FABind: Fast and Accurate Protein-Ligand Binding [127.7790493202716]
$mathbfFABind$ is an end-to-end model that combines pocket prediction and docking to achieve accurate and fast protein-ligand binding. Our proposed model demonstrates strong advantages in terms of effectiveness and efficiency compared to existing methods.
arXiv Detail & Related papers (2023-10-10T16:39:47Z)
SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval [92.27387459751309]
We provide SPRINT, a unified Python toolkit for evaluating neural sparse retrieval. We establish strong and reproducible zero-shot sparse retrieval baselines across the well-acknowledged benchmark, BEIR. We show that SPLADEv2 produces sparse representations with a majority of tokens outside of the original query and document.
arXiv Detail & Related papers (2023-07-19T22:48:02Z)
PyVBMC: Efficient Bayesian inference in Python [8.924669503280333]
PyVBMC is a Python implementation of the Variational Bayesian Monte Carlo (VBMC) algorithm for posterior and model inference. VBMC is designed for efficient parameter estimation and model assessment when model evaluations are mildly-to-very expensive.
arXiv Detail & Related papers (2023-03-16T17:37:22Z)
SurvLIMEpy: A Python package implementing SurvLIME [1.0689187493307983]
We present SurvLIMEpy, an open-source Python package that implements the SurvLIME algorithm. The package supports a wide variety of survival models, from the Cox Proportional Hazards Model to deep learning models such as DeepHit or DeepSurv.
arXiv Detail & Related papers (2023-02-21T09:54:32Z)
Minimalist Data Wrangling with Python [4.429175633425273]
Data Wrangling with Python is envisaged as a student's first introduction to data science. It provides a high-level overview as well as discussing key concepts in detail.
arXiv Detail & Related papers (2022-11-09T01:24:39Z)
Smooth densities and generative modeling with unsupervised random forests [1.433758865948252]
An important application for density estimators is synthetic data generation. We propose a new method based on unsupervised random forests for estimating smooth densities in arbitrary dimensions without parametric constraints. We prove the consistency of our approach and demonstrate its advantages over existing tree-based density estimators.
arXiv Detail & Related papers (2022-05-19T09:50:25Z)
DADApy: Distance-based Analysis of DAta-manifolds in Python [51.37841707191944]
DADApy is a python software package for analysing and characterising high-dimensional data. It provides methods for estimating the intrinsic dimension and the probability density, for performing density-based clustering and for comparing different distance metrics.
arXiv Detail & Related papers (2022-05-04T08:41:59Z)
Scikit-dimension: a Python package for intrinsic dimension estimation [58.8599521537]
This technical note introduces textttscikit-dimension, an open-source Python package for intrinsic dimension estimation. textttscikit-dimension package provides a uniform implementation of most of the known ID estimators based on scikit-learn application programming interface. We briefly describe the package and demonstrate its use in a large-scale (more than 500 datasets) benchmarking of methods for ID estimation in real-life and synthetic data.
arXiv Detail & Related papers (2021-09-06T16:46:38Z)
Program Synthesis with Large Language Models [40.41120807053989]
We evaluate large language models for program synthesis in Python. We find that synthesis performance scales log-linearly with model size. We find that even our best models are generally unable to predict the output of a program given a specific input.
arXiv Detail & Related papers (2021-08-16T03:57:30Z)
Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts. We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data. We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z)
OPFython: A Python-Inspired Optimum-Path Forest Classifier [68.8204255655161]
This paper proposes a Python-based Optimum-Path Forest framework, denoted as OPFython. As OPFython is a Python-based library, it provides a more friendly environment and a faster prototyping workspace than the C language.
arXiv Detail & Related papers (2020-01-28T15:46:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.