DiffKnock: Diffusion-based Knockoff Statistics for Neural Networks Inference
- URL: http://arxiv.org/abs/2510.01418v1
- Date: Wed, 01 Oct 2025 19:54:23 GMT
- Title: DiffKnock: Diffusion-based Knockoff Statistics for Neural Networks Inference
- Authors: Heng Ge, Qing Lu,
- Abstract summary: We introduce DiffKnock, a diffusion-based knockoff framework for high-dimensional feature selection with finite-sample false discovery rate (FDR) control.<n>Our approach trains diffusion models to generate valid knockoffs and uses neural network--based gradient and filter statistics to construct antisymmetric feature importance measures.
- Score: 0.8570687293939802
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce DiffKnock, a diffusion-based knockoff framework for high-dimensional feature selection with finite-sample false discovery rate (FDR) control. DiffKnock addresses two key limitations of existing knockoff methods: preserving complex feature dependencies and detecting non-linear associations. Our approach trains diffusion models to generate valid knockoffs and uses neural network--based gradient and filter statistics to construct antisymmetric feature importance measures. Through simulations, we showed that DiffKnock achieved higher power than autoencoder-based knockoffs while maintaining target FDR, indicating its superior performance in scenarios involving complex non-linear architectures. Applied to murine single-cell RNA-seq data of LPS-stimulated macrophages, DiffKnock identifies canonical NF-$\kappa$B target genes (Ccl3, Hmox1) and regulators (Fosb, Pdgfb). These results highlight that, by combining the flexibility of deep generative models with rigorous statistical guarantees, DiffKnock is a powerful and reliable tool for analyzing single-cell RNA-seq data, as well as high-dimensional and structured data in other domains.
Related papers
- Learning False Discovery Rate Control via Model-Based Neural Networks [14.45679797184966]
We introduce a learning-augmented enhancement of the T-Rex Selector framework that narrows the gap between the realized false discovery proportion (FDP) and the target false discovery rate (FDR)<n>Our approach replaces the analytical FDP estimator with a neural network trained solely on diverse synthetic datasets, enabling a substantially tighter and more accurate approximation of the FDP.
arXiv Detail & Related papers (2026-02-05T15:53:11Z) - Source-Free Object Detection with Detection Transformer [59.33653163035064]
Source-Free Object Detection (SFOD) enables knowledge transfer from a source domain to an unsupervised target domain for object detection without access to source data.<n>Most existing SFOD approaches are either confined to conventional object detection (OD) models like Faster R-CNN or designed as general solutions without tailored adaptations for novel OD architectures, especially Detection Transformer (DETR)<n>In this paper, we introduce Feature Reweighting ANd Contrastive Learning NetworK (FRANCK), a novel SFOD framework specifically designed to perform query-centric feature enhancement for DETRs.
arXiv Detail & Related papers (2025-10-13T07:35:04Z) - A Geometric Graph-Based Deep Learning Model for Drug-Target Affinity Prediction [0.0]
We introduce DeepGGL, a deep convolutional neural network that integrates residual connections and an attention mechanism within a geometric graph learning framework.<n>By leveraging multiscale weighted colored bipartite subgraphs, DeepGGL effectively captures fine-grained atom-level interactions in protein-ligand complexes across multiple scales.<n>DeepGGL consistently maintained high predictive accuracy, highlighting its adaptability and reliability for binding affinity prediction in structure-based drug discovery.
arXiv Detail & Related papers (2025-09-15T14:06:39Z) - ReDiSC: A Reparameterized Masked Diffusion Model for Scalable Node Classification with Structured Predictions [64.17845687013434]
We propose ReDiSC, a structured diffusion model for structured node classification.<n>We show that ReDiSC achieves superior or highly competitive performance compared to state-of-the-art GNN, label propagation, and diffusion-based baselines.<n> Notably, ReDiSC scales effectively to large-scale datasets on which previous structured diffusion methods fail due to computational constraints.
arXiv Detail & Related papers (2025-07-19T04:46:53Z) - LapDDPM: A Conditional Graph Diffusion Model for scRNA-seq Generation with Spectral Adversarial Perturbations [1.0377683220196872]
We introduce LapDDPM, a novel conditional Graph Diffusion Probabilistic Model for robust and high-fidelity scRNA-seq generation.<n>Our contributions are threefold: we develop a conditional score-based model for effective learning and generation from complex scRNA-seq distributions.<n>Experiments on diverse scRNA-seq datasets demonstrate LapDDPM's superior performance, achieving high fidelity and generating biologically-plausible, cell-type-specific samples.
arXiv Detail & Related papers (2025-06-16T10:35:32Z) - Large-Scale Targeted Cause Discovery via Learning from Simulated Data [66.51307552703685]
We propose a novel machine learning approach for inferring causal variables of a target variable from observations.<n>We train a neural network using supervised learning on simulated data to infer causality.<n> Empirical results demonstrate superior performance in identifying causal relationships within large-scale gene regulatory networks.
arXiv Detail & Related papers (2024-08-29T02:21:11Z) - DeepDRK: Deep Dependency Regularized Knockoff for Feature Selection [14.840211139848275]
"Deep Dependency Regularized Knockoff (DeepDRK)" is a distribution-free deep learning method that effectively balances FDR and power.
We introduce a novel formulation of the knockoff model as a learning problem under multi-source adversarial attacks.
Our model outperforms existing benchmarks across synthetic, semi-synthetic, and real-world datasets.
arXiv Detail & Related papers (2024-02-27T03:24:54Z) - DynGFN: Towards Bayesian Inference of Gene Regulatory Networks with
GFlowNets [81.75973217676986]
Gene regulatory networks (GRN) describe interactions between genes and their products that control gene expression and cellular function.
Existing methods either focus on challenge (1), identifying cyclic structure from dynamics, or on challenge (2) learning complex Bayesian posteriors over DAGs, but not both.
In this paper we leverage the fact that it is possible to estimate the "velocity" of gene expression with RNA velocity techniques to develop an approach that addresses both challenges.
arXiv Detail & Related papers (2023-02-08T16:36:40Z) - Error-based Knockoffs Inference for Controlled Feature Selection [49.99321384855201]
We propose an error-based knockoff inference method by integrating the knockoff features, the error-based feature importance statistics, and the stepdown procedure together.
The proposed inference procedure does not require specifying a regression model and can handle feature selection with theoretical guarantees.
arXiv Detail & Related papers (2022-03-09T01:55:59Z) - Variable Selection with the Knockoffs: Composite Null Hypotheses [2.725698729450241]
We extend the theory of the knockoff procedure to tests with composite null hypotheses.
The main technical challenge lies in handling composite nulls in tandem with dependent features from arbitrary designs.
arXiv Detail & Related papers (2022-03-06T01:40:35Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.