Related papers: Decorrelated feature importance from local sample weighting

Decorrelated feature importance from local sample weighting

URL: http://arxiv.org/abs/2508.06337v1
Date: Fri, 08 Aug 2025 14:11:18 GMT
Title: Decorrelated feature importance from local sample weighting
Authors: Benedikt Fröhlich, Alison Durst, Merle Behr,
Abstract summary: Local sample weighting (losaw) can be integrated into many machine learning algorithms to improve Feature importance (FI) scores.<n>We show how losaw can be integrated within decision tree-based ML methods and within mini-batch training of neural networks.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Feature importance (FI) statistics provide a prominent and valuable method of insight into the decision process of machine learning (ML) models, but their effectiveness has well-known limitations when correlation is present among the features in the training data. In this case, the FI often tends to be distributed among all features which are in correlation with the response-generating signal features. Even worse, if multiple signal features are in strong correlation with a noise feature, while being only modestly correlated with one another, this can result in a noise feature having a distinctly larger FI score than any signal feature. Here we propose local sample weighting (losaw) which can flexibly be integrated into many ML algorithms to improve FI scores in the presence of feature correlation in the training data. Our approach is motivated from inverse probability weighting in causal inference and locally, within the ML model, uses a sample weighting scheme to decorrelate a target feature from the remaining features. This reduces model bias locally, whenever the effect of a potential signal feature is evaluated and compared to others. Moreover, losaw comes with a natural tuning parameter, the minimum effective sample size of the weighted population, which corresponds to an interpretation-prediction-tradeoff, analog to a bias-variance-tradeoff as for classical ML tuning parameters. We demonstrate how losaw can be integrated within decision tree-based ML methods and within mini-batch training of neural networks. We investigate losaw for random forest and convolutional neural networks in a simulation study on settings showing diverse correlation patterns. We found that losaw improves FI consistently. Moreover, it often improves prediction accuracy for out-of-distribution, while maintaining a similar accuracy for in-distribution test data.

Related papers

Self-Supervised Learning via Flow-Guided Neural Operator on Time-Series Data [57.85958428020496]
Flow-Guided Neural Operator (FGNO) is a novel framework combining operator learning with flow matching for SSL training.<n>FGNO learns mappings in functional spaces by using Short-Time Fourier Transform to unify different time resolutions.<n>Unlike prior generative SSL methods that use noisy inputs during inference, we propose using clean inputs for representation extraction while learning representations with noise.
arXiv Detail & Related papers (2026-02-12T18:54:57Z)
Why Machine Learning Models Systematically Underestimate Extreme Values II: How to Fix It with LatentNN [0.2700171473617699]
Attenuation bias affects astronomical data-driven models.<n>We show that neural networks suffer from the same attenuation bias.<n>We introduce LatentNN, a method that jointly optimize network parameters and latent input values.
arXiv Detail & Related papers (2025-12-29T01:59:10Z)
A Simple Approximate Bayesian Inference Neural Surrogate for Stochastic Petri Net Models [0.0]
We introduce a neural-network-based approximation of the posterior distribution framework.<n>Our model employs a lightweight 1D Convolutional Residual Network trained end-to-end on Gillespie-simulated SPN realizations.<n>On synthetic SPNs with 20% missing events, our surrogate recovers rate-function coefficients with an RMSE = 0.108 and substantially runs faster than traditional Bayesian approaches.
arXiv Detail & Related papers (2025-07-14T18:31:19Z)
Joint Graph Estimation and Signal Restoration for Robust Federated Learning [11.817062392718807]
We propose a robust aggregation method for model parameters in federated learning (FL) under noisy communications.<n>We show that the proposed method outperforms existing approaches by up to $2$-$5$ in classification accuracy under biased data and noisy conditions.
arXiv Detail & Related papers (2025-05-16T19:17:59Z)
Spatial Reasoning with Denoising Models [49.83744014336816]
We introduce a framework to perform reasoning over sets of continuous variables via denoising generative models.<n>For the first time, that order of generation can successfully be predicted by the denoising network itself.<n>Using these findings, we can increase the accuracy of specific reasoning tasks from 1% to >50%.
arXiv Detail & Related papers (2025-02-28T14:08:30Z)
Dimension-free Score Matching and Time Bootstrapping for Diffusion Models [11.743167854433306]
Diffusion models generate samples by estimating the score function of the target distribution at various noise levels.<n>In this work, we establish the first (nearly) dimension-free sample bounds complexity for learning these score functions.<n>A key aspect of our analysis is the use of a single function approximator to jointly estimate scores across noise levels.
arXiv Detail & Related papers (2025-02-14T18:32:22Z)
Noise-Resilient Unsupervised Graph Representation Learning via Multi-Hop Feature Quality Estimation [53.91958614666386]
Unsupervised graph representation learning (UGRL) based on graph neural networks (GNNs) We propose a novel UGRL method based on Multi-hop feature Quality Estimation (MQE)
arXiv Detail & Related papers (2024-07-29T12:24:28Z)
Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data. We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations. Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z)
Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis. We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z)
Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution. We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z)
Fast covariance parameter estimation of spatial Gaussian process models using neural networks [0.0]
We train NNs to take moderate size spatial fields or variograms as input and return the range and noise-to-signal covariance parameters. Once trained, the NNs provide estimates with a similar accuracy compared to ML estimation and at a speedup by a factor of 100 or more. This work can be easily extended to other, more complex, spatial problems and provides a proof-of-concept for this use of machine learning in computational statistics.
arXiv Detail & Related papers (2020-12-30T22:06:26Z)
Error Autocorrelation Objective Function for Improved System Modeling [1.2760453906939444]
We introduce a "whitening" cost function, the Ljung-Box statistic, which not only minimizes the error but also minimizes the correlations between errors. The results show significant improvement in generalization for recurrent neural networks (RNNs) and image autoencoders (2d)
arXiv Detail & Related papers (2020-08-08T19:20:32Z)
Out-of-distribution Generalization via Partial Feature Decorrelation [72.96261704851683]
We present a novel Partial Feature Decorrelation Learning (PFDL) algorithm, which jointly optimize a feature decomposition network and the target image classification model. The experiments on real-world datasets demonstrate that our method can improve the backbone model's accuracy on OOD image classification datasets.
arXiv Detail & Related papers (2020-07-30T05:48:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.