Out-of-Distribution Detection in Molecular Complexes via Diffusion Models for Irregular Graphs
- URL: http://arxiv.org/abs/2512.18454v1
- Date: Sat, 20 Dec 2025 17:56:15 GMT
- Title: Out-of-Distribution Detection in Molecular Complexes via Diffusion Models for Irregular Graphs
- Authors: David Graber, Victor Armegioiu, Rebecca Buller, Siddhartha Mishra,
- Abstract summary: We present a probabilistic OOD detection framework for complex 3D graph data built on a diffusion model.<n>A single probability-flow ODE produces per-sample log-likelihoods, providing a principled typicality score for distribution shift.<n>We validate the approach on protein-ligand complexes and construct strict OOD datasets.
- Score: 11.928558263824213
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Predictive machine learning models generally excel on in-distribution data, but their performance degrades on out-of-distribution (OOD) inputs. Reliable deployment therefore requires robust OOD detection, yet this is particularly challenging for irregular 3D graphs that combine continuous geometry with categorical identities and are unordered by construction. Here, we present a probabilistic OOD detection framework for complex 3D graph data built on a diffusion model that learns a density of the training distribution in a fully unsupervised manner. A key ingredient we introduce is a unified continuous diffusion over both 3D coordinates and discrete features: categorical identities are embedded in a continuous space and trained with cross-entropy, while the corresponding diffusion score is obtained analytically via posterior-mean interpolation from predicted class probabilities. This yields a single self-consistent probability-flow ODE (PF-ODE) that produces per-sample log-likelihoods, providing a principled typicality score for distribution shift. We validate the approach on protein-ligand complexes and construct strict OOD datasets by withholding entire protein families from training. PF-ODE likelihoods identify held-out families as OOD and correlate strongly with prediction errors of an independent binding-affinity model (GEMS), enabling a priori reliability estimates on new complexes. Beyond scalar likelihoods, we show that multi-scale PF-ODE trajectory statistics - including path tortuosity, flow stiffness, and vector-field instability - provide complementary OOD information. Modeling the joint distribution of these trajectory features yields a practical, high-sensitivity detector that improves separation over likelihood-only baselines, offering a label-free OOD quantification workflow for geometric deep learning.
Related papers
- Flow Matching is Adaptive to Manifold Structures [32.55405572762157]
Flow matching is a simulation-based alternative to diffusion-based generative modeling.<n>We show how flow matching adapts to data geometry and circumvents the curse of dimensionality.
arXiv Detail & Related papers (2026-02-25T23:52:32Z) - SCOPED: Score-Curvature Out-of-distribution Proximity Evaluator for Diffusion [5.008779702997125]
Out-of-distribution (OOD) detection is essential for reliable deployment of machine learning systems in vision, robotics, reinforcement learning, and beyond.<n>We introduce Score-Curvature Out-of-distribution Proximity Evaluator for Diffusion (SCOPED)<n>SCOPED is computed from a single diffusion model trained once on a diverse dataset, and combines the Jacobian trace and squared norm of the model's score function into a single test statistic.<n>On four vision benchmarks, SCOPED achieves competitive or state-of-the-art precision-recall scores despite its low computational cost.
arXiv Detail & Related papers (2025-10-01T20:54:49Z) - Learning Distributions of Complex Fluid Simulations with Diffusion Graph Networks [23.196852966408482]
We propose a graph-based latent diffusion model that enables direct sampling of states from their equilibrium distribution.<n>This allows for the efficient geometries of flow statistics without running long and expensive numerical simulations.<n>We apply this method to a range of fluid dynamics tasks, such as predicting pressure on 3D wing models in turbulent flow.
arXiv Detail & Related papers (2025-03-19T13:04:39Z) - Sub-graph Based Diffusion Model for Link Prediction [43.15741675617231]
Denoising Diffusion Probabilistic Models (DDPMs) represent a contemporary class of generative models with exceptional qualities.
We build a novel generative model for link prediction using a dedicated design to decompose the likelihood estimation process via the Bayesian formula.
Our proposed method presents numerous advantages: (1) transferability across datasets without retraining, (2) promising generalization on limited training data, and (3) robustness against graph adversarial attacks.
arXiv Detail & Related papers (2024-09-13T02:23:55Z) - Optimizing OOD Detection in Molecular Graphs: A Novel Approach with Diffusion Models [71.39421638547164]
We propose to detect OOD molecules by adopting an auxiliary diffusion model-based framework, which compares similarities between input molecules and reconstructed graphs.
Due to the generative bias towards reconstructing ID training samples, the similarity scores of OOD molecules will be much lower to facilitate detection.
Our research pioneers an approach of Prototypical Graph Reconstruction for Molecular OOD Detection, dubbed as PGR-MOOD and hinges on three innovations.
arXiv Detail & Related papers (2024-04-24T03:25:53Z) - Topological Detection of Phenomenological Bifurcations with Unreliable
Kernel Densities [0.5874142059884521]
Phenomenological (P-type) bifurcations are qualitative changes in dynamical systems.
Current state of the art for detecting these bifurcations requires reliable kernel density estimates computed from an ensemble of system realizations.
This study presents an approach for detecting P-type bifurcations using unreliable density estimates.
arXiv Detail & Related papers (2024-01-29T20:59:25Z) - Score Approximation, Estimation and Distribution Recovery of Diffusion
Models on Low-Dimensional Data [68.62134204367668]
This paper studies score approximation, estimation, and distribution recovery of diffusion models, when data are supported on an unknown low-dimensional linear subspace.
We show that with a properly chosen neural network architecture, the score function can be both accurately approximated and efficiently estimated.
The generated distribution based on the estimated score function captures the data geometric structures and converges to a close vicinity of the data distribution.
arXiv Detail & Related papers (2023-02-14T17:02:35Z) - Energy-based Out-of-Distribution Detection for Graph Neural Networks [76.0242218180483]
We propose a simple, powerful and efficient OOD detection model for GNN-based learning on graphs, which we call GNNSafe.
GNNSafe achieves up to $17.0%$ AUROC improvement over state-of-the-arts and it could serve as simple yet strong baselines in such an under-developed area.
arXiv Detail & Related papers (2023-02-06T16:38:43Z) - Nonlinear Isometric Manifold Learning for Injective Normalizing Flows [58.720142291102135]
We use isometries to separate manifold learning and density estimation.
We also employ autoencoders to design embeddings with explicit inverses that do not distort the probability distribution.
arXiv Detail & Related papers (2022-03-08T08:57:43Z) - General stochastic separation theorems with optimal bounds [68.8204255655161]
Phenomenon of separability was revealed and used in machine learning to correct errors of Artificial Intelligence (AI) systems and analyze AI instabilities.
Errors or clusters of errors can be separated from the rest of the data.
The ability to correct an AI system also opens up the possibility of an attack on it, and the high dimensionality induces vulnerabilities caused by the same separability.
arXiv Detail & Related papers (2020-10-11T13:12:41Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.