Related papers: Easy Data Unlearning Bench

Easy Data Unlearning Bench

URL: http://arxiv.org/abs/2602.16400v1
Date: Wed, 18 Feb 2026 12:20:32 GMT
Title: Easy Data Unlearning Bench
Authors: Roy Rinberg, Pol Puigdemont, Martin Pawelczyk, Volkan Cevher,
Abstract summary: We introduce a unified and benchmarking suite that simplifies the evaluation of unlearning algorithms.<n>By standardizing setup and metrics, it enables reproducible, scalable, and fair comparison across unlearning methods.
Score: 53.1304932656586
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Evaluating machine unlearning methods remains technically challenging, with recent benchmarks requiring complex setups and significant engineering overhead. We introduce a unified and extensible benchmarking suite that simplifies the evaluation of unlearning algorithms using the KLoM (KL divergence of Margins) metric. Our framework provides precomputed model ensembles, oracle outputs, and streamlined infrastructure for running evaluations out of the box. By standardizing setup and metrics, it enables reproducible, scalable, and fair comparison across unlearning methods. We aim for this benchmark to serve as a practical foundation for accelerating research and promoting best practices in machine unlearning. Our code and data are publicly available.

Related papers

PACIFIC: a framework for generating benchmarks to check Precise Automatically Checked Instruction Following In Code [1.1164117387254457]
Large Language Model (LLM)-based code assistants have emerged as a powerful application of generative AI.<n>Key requirement for these systems is their ability to accurately follow user instructions.<n>We present PACIFIC, a novel framework designed to automatically generate benchmarks that rigorously assess sequential instruction-following and code dry-running capabilities.
arXiv Detail & Related papers (2025-12-11T14:49:56Z)
OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and Metrics [82.0813150432867]
We introduce OpenUnlearning, a standardized framework for benchmarking large language models (LLMs) unlearning methods and metrics.<n>OpenUnlearning integrates 13 unlearning algorithms and 16 diverse evaluations across 3 leading benchmarks.<n>We also benchmark diverse unlearning methods and provide a comparative analysis against an extensive evaluation suite.
arXiv Detail & Related papers (2025-06-14T20:16:37Z)
Rethinking Predictive Modeling for LLM Routing: When Simple kNN Beats Complex Learned Routers [3.090041654375235]
We show that a well-tuned k-Nearest Neighbors (kNN) approach outperforms state-of-the-art learned routers across diverse tasks.<n>Our findings reveal that the locality properties of model performance in embedding space enable simple non-parametric methods to achieve strong routing decisions.
arXiv Detail & Related papers (2025-05-19T01:33:41Z)
Computational Reasoning of Large Language Models [51.629694188014064]
We introduce textbfTuring Machine Bench, a benchmark to assess the ability of Large Language Models (LLMs) to execute reasoning processes.<n> TMBench incorporates four key features: self-contained and knowledge-agnostic reasoning, a minimalistic multi-step structure, controllable difficulty, and a theoretical foundation based on Turing machine.
arXiv Detail & Related papers (2025-04-29T13:52:47Z)
Learning an Effective Premise Retrieval Model for Efficient Mathematical Formalization [29.06255449960557]
We introduce a novel method that leverages data extracted from Mathlib to train a lightweight and effective premise retrieval model.<n>The model is learned in a contrastive learning framework, in which a fine-grained similarity calculation method and a re-ranking module are applied.<n> Experimental results demonstrate that our model outperforms existing baselines, achieving higher accuracy while maintaining a lower computational load.
arXiv Detail & Related papers (2025-01-21T06:32:25Z)
Classification Performance Metric Elicitation and its Applications [5.5637552942511155]
Despite its practical interest, there is limited formal guidance on how to select metrics for machine learning applications. This thesis outlines metric elicitation as a principled framework for selecting the performance metric that best reflects implicit user preferences.
arXiv Detail & Related papers (2022-08-19T03:57:17Z)
Benchopt: Reproducible, efficient and collaborative optimization benchmarks [67.29240500171532]
Benchopt is a framework to automate, reproduce and publish optimization benchmarks in machine learning. Benchopt simplifies benchmarking for the community by providing an off-the-shelf tool for running, sharing and extending experiments.
arXiv Detail & Related papers (2022-06-27T16:19:24Z)
HyperImpute: Generalized Iterative Imputation with Automatic Model Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models. We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z)
An Extensible Benchmark Suite for Learning to Simulate Physical Systems [60.249111272844374]
We introduce a set of benchmark problems to take a step towards unified benchmarks and evaluation protocols. We propose four representative physical systems, as well as a collection of both widely used classical time-based and representative data-driven methods.
arXiv Detail & Related papers (2021-08-09T17:39:09Z)
Synthetic Benchmarks for Scientific Research in Explainable Machine Learning [14.172740234933215]
We release XAI-Bench: a suite of synthetic datasets and a library for benchmarking feature attribution algorithms. Unlike real-world datasets, synthetic datasets allow the efficient computation of conditional expected values. We demonstrate the power of our library by benchmarking popular explainability techniques across several evaluation metrics and identifying failure modes for popular explainers.
arXiv Detail & Related papers (2021-06-23T17:10:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.