Related papers: Active Fourier Auditor for Estimating Distributional Properties of ML Models

Active Fourier Auditor for Estimating Distributional Properties of ML Models

URL: http://arxiv.org/abs/2410.08111v1
Date: Thu, 10 Oct 2024 16:57:01 GMT
Title: Active Fourier Auditor for Estimating Distributional Properties of ML Models
Authors: Ayoub Ajarra, Bishwamittra Ghosh, Debabrota Basu,
Abstract summary: We focus on three properties: robustness, individual fairness, and group fairness. We develop a new framework that quantifies different properties in terms of the Fourier coefficients of the ML model under audit. We derive high probability error bounds on AFA's estimates, along with the worst-case lower bounds on the sample complexity to audit them.
Score: 10.581140430698103
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: With the pervasive deployment of Machine Learning (ML) models in real-world applications, verifying and auditing properties of ML models have become a central concern. In this work, we focus on three properties: robustness, individual fairness, and group fairness. We discuss two approaches for auditing ML model properties: estimation with and without reconstruction of the target model under audit. Though the first approach is studied in the literature, the second approach remains unexplored. For this purpose, we develop a new framework that quantifies different properties in terms of the Fourier coefficients of the ML model under audit but does not parametrically reconstruct it. We propose the Active Fourier Auditor (AFA), which queries sample points according to the Fourier coefficients of the ML model, and further estimates the properties. We derive high probability error bounds on AFA's estimates, along with the worst-case lower bounds on the sample complexity to audit them. Numerically we demonstrate on multiple datasets and models that AFA is more accurate and sample-efficient to estimate the properties of interest than the baselines.

Related papers

Verifying the Verifiers: Unveiling Pitfalls and Potentials in Fact Verifiers [59.168391398830515]
We evaluate 12 pre-trained LLMs and one specialized fact-verifier, using a collection of examples from 14 fact-checking benchmarks.<n>We highlight the importance of addressing annotation errors and ambiguity in datasets.<n> frontier LLMs with few-shot in-context examples, often overlooked in previous works, achieve top-tier performance.
arXiv Detail & Related papers (2025-06-16T10:32:10Z)
Instance-Level Data-Use Auditing of Visual ML Models [47.369572284751285]
Growing trend of legal disputes over the unauthorized use of data in machine learning (ML) systems highlights the need for reliable data-use auditing mechanisms. We present the first proactive instance-level data-use auditing method designed to enable data owners to audit the use of their individual data instances in ML models.
arXiv Detail & Related papers (2025-03-28T13:28:57Z)
FACT-AUDIT: An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language Models [79.41859481668618]
Large Language Models (LLMs) have significantly advanced the fact-checking studies. Existing automated fact-checking evaluation methods rely on static datasets and classification metrics. We introduce FACT-AUDIT, an agent-driven framework that adaptively and dynamically assesses LLMs' fact-checking capabilities.
arXiv Detail & Related papers (2025-02-25T07:44:22Z)
Preference Leakage: A Contamination Problem in LLM-as-a-judge [69.96778498636071]
Large Language Models (LLMs) as judges and LLM-based data synthesis have emerged as two fundamental LLM-driven data annotation methods. In this work, we expose preference leakage, a contamination problem in LLM-as-a-judge caused by the relatedness between the synthetic data generators and LLM-based evaluators.
arXiv Detail & Related papers (2025-02-03T17:13:03Z)
MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs [38.93090238335506]
Spurious bias, a tendency to use spurious correlations between non-essential input attributes and target variables for predictions, has revealed a severe pitfall in deep learning models trained on single modality data. We introduce MM-SpuBench, a comprehensive visual question-answering (VQA) benchmark designed to evaluate MLLMs' reliance on nine distinct categories of spurious correlations. Our findings illuminate the persistence of the reliance on spurious correlations from these models and underscore the urge for new methodologies to mitigate spurious biases.
arXiv Detail & Related papers (2024-06-24T20:29:16Z)
Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score. Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score. Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z)
ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks [91.55895047448249]
This paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases. We implement ReEval using ChatGPT and evaluate the resulting variants of two popular open-domain QA datasets. Our generated data is human-readable and useful to trigger hallucination in large language models.
arXiv Detail & Related papers (2023-10-19T06:37:32Z)
Towards Better Modeling with Missing Data: A Contrastive Learning-based Visual Analytics Perspective [7.577040836988683]
Missing data can pose a challenge for machine learning (ML) modeling. Current approaches are categorized into feature imputation and label prediction. This study proposes a Contrastive Learning framework to model observed data with missing values.
arXiv Detail & Related papers (2023-09-18T13:16:24Z)
A Bayesian Framework on Asymmetric Mixture of Factor Analyser [0.0]
This paper introduces an MFA model with a rich and flexible class of skew normal (unrestricted) generalized hyperbolic (called SUNGH) distributions. The SUNGH family provides considerable flexibility to model skewness in different directions as well as allowing for heavy tailed data. Considering factor analysis models, the SUNGH family also allows for skewness and heavy tails for both the error component and factor scores.
arXiv Detail & Related papers (2022-11-01T20:19:52Z)
Sparse MoEs meet Efficient Ensembles [49.313497379189315]
We study the interplay of two popular classes of such models: ensembles of neural networks and sparse mixture of experts (sparse MoEs) We present Efficient Ensemble of Experts (E$3$), a scalable and simple ensemble of sparse MoEs that takes the best of both classes of models, while using up to 45% fewer FLOPs than a deep ensemble.
arXiv Detail & Related papers (2021-10-07T11:58:35Z)
Reducing Unintended Bias of ML Models on Tabular and Textual Data [5.503546193689538]
We revisit the framework FixOut that is inspired in the approach "fairness through unawareness" to build fairer models. We introduce several improvements such as automating the choice of FixOut's parameters. We present several experimental results that illustrate the fact that FixOut improves process fairness on different classification settings.
arXiv Detail & Related papers (2021-08-05T14:55:56Z)
How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating and Auditing Generative Models [95.8037674226622]
We introduce a 3-dimensional evaluation metric that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion. Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity.
arXiv Detail & Related papers (2021-02-17T18:25:30Z)
Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors. We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method. Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z)
Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously. We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework. The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z)
Data and Model Dependencies of Membership Inference Attack [13.951470844348899]
We provide an empirical analysis of the impact of both the data and ML model properties on the vulnerability of ML techniques to MIA. Our results reveal the relationship between MIA accuracy and properties of the dataset and training model in use. We propose using those data and model properties as regularizers to protect ML models against MIA.
arXiv Detail & Related papers (2020-02-17T09:35:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.