Symbolic Neural Generation with Applications to Lead Discovery in Drug Design
- URL: http://arxiv.org/abs/2510.23379v1
- Date: Mon, 27 Oct 2025 14:29:22 GMT
- Title: Symbolic Neural Generation with Applications to Lead Discovery in Drug Design
- Authors: Ashwin Srinivasan, A Baskar, Tirtharaj Dash, Michael Bain, Sanjay Kumar Dey, Mainak Banerjee,
- Abstract summary: We investigate a class of hybrid neurosymbolic models integrating symbolic learning with neural reasoning.<n>In textitSymbolic Neural Generators (SNGs), symbolic learners examine logical specifications of feasible data from a small set of instances.<n>We implement an SNG combining a restricted form of Inductive Logic Programming (ILP) with a large language model (LLM) and evaluate it on early-stage drug design.
- Score: 1.3534513856953387
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We investigate a relatively underexplored class of hybrid neurosymbolic models integrating symbolic learning with neural reasoning to construct data generators meeting formal correctness criteria. In \textit{Symbolic Neural Generators} (SNGs), symbolic learners examine logical specifications of feasible data from a small set of instances -- sometimes just one. Each specification in turn constrains the conditional information supplied to a neural-based generator, which rejects any instance violating the symbolic specification. Like other neurosymbolic approaches, SNG exploits the complementary strengths of symbolic and neural methods. The outcome of an SNG is a triple $(H, X, W)$, where $H$ is a symbolic description of feasible instances constructed from data, $X$ a set of generated new instances that satisfy the description, and $W$ an associated weight. We introduce a semantics for such systems, based on the construction of appropriate \textit{base} and \textit{fibre} partially-ordered sets combined into an overall partial order, and outline a probabilistic extension relevant to practical applications. In this extension, SNGs result from searching over a weighted partial ordering. We implement an SNG combining a restricted form of Inductive Logic Programming (ILP) with a large language model (LLM) and evaluate it on early-stage drug design. Our main interest is the description and the set of potential inhibitor molecules generated by the SNG. On benchmark problems -- where drug targets are well understood -- SNG performance is statistically comparable to state-of-the-art methods. On exploratory problems with poorly understood targets, generated molecules exhibit binding affinities on par with leading clinical candidates. Experts further find the symbolic specifications useful as preliminary filters, with several generated molecules identified as viable for synthesis and wet-lab testing.
Related papers
- Neural Proposals, Symbolic Guarantees: Neuro-Symbolic Graph Generation with Hard Constraints [11.61618152472216]
We introduce Neuro-Symbolic Graph Generative Modeling (NSGGM), a neurosymbolic framework that reapproaches molecule generation as a scaffold and interaction learning task with symbolic assembly.<n>An autoregressive neural model proposes scaffolds and refines interaction signals, and a CPU-efficient SMT solver constructs full graphs while enforcing chemical validity, structural rules, and user-specific constraints.<n>NSGGM delivers strong performance on both unconstrained generation and constrained generation tasks, demonstrating that neuro-symbolic modeling can match state-of-the-art generative performance while offering explicit controllability and guarantees.
arXiv Detail & Related papers (2026-02-18T23:37:15Z) - Protect$^*$: Steerable Retrosynthesis through Neuro-Symbolic State Encoding [0.0]
We introduce Protect$*$, a neuro-symbolic framework that grounds the generative capabilities of Large Language Models (LLMs) in rigorous chemical logic.<n>Our approach combines automated rule-based reasoning and the generative of neural models.<n>We demonstrate this neuro-symbolic approach through case studies on complex natural products, including the discovery of a novel synthetic pathway for Erythromycin B.
arXiv Detail & Related papers (2026-02-13T19:41:55Z) - Logic of Hypotheses: from Zero to Full Knowledge in Neurosymbolic Integration [46.43084711486819]
Neurosymbolic integration (NeSy) blends neural-network learning with symbolic reasoning.<n>We introduce Logic of Hypotheses (LoH), a novel language that unifies data-driven rule learning with symbolic priors and expert knowledge.
arXiv Detail & Related papers (2025-09-25T22:31:43Z) - GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters.<n>Trained on an expansive dataset comprising 386B bp of DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks.<n>It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of enhancer sequences with specific activity profiles.
arXiv Detail & Related papers (2025-02-11T05:39:49Z) - Consistency of Neural Causal Partial Identification [17.503562318576414]
Recent progress in Causal Models showcased how identification and partial identification of causal effects can be automatically carried out via neural generative models.<n>We prove consistency of partial identification via NCMs in a general setting with both continuous and categorical variables.<n>Results highlight the impact of the design of the underlying neural network architecture in terms of depth and connectivity.
arXiv Detail & Related papers (2024-05-24T16:12:39Z) - Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - The Role of Foundation Models in Neuro-Symbolic Learning and Reasoning [54.56905063752427]
Neuro-Symbolic AI (NeSy) holds promise to ensure the safe deployment of AI systems.
Existing pipelines that train the neural and symbolic components sequentially require extensive labelling.
New architecture, NeSyGPT, fine-tunes a vision-language foundation model to extract symbolic features from raw data.
arXiv Detail & Related papers (2024-02-02T20:33:14Z) - Integrating Symbolic Reasoning into Neural Generative Models for Design Generation [10.97301490742749]
Design generation requires tight integration of neural and symbolic reasoning.
We introduce the Spatial Reasoning Integrated Generator (SPRING) for design generation.
SPRING embeds a neural and symbolic integrated spatial reasoning module inside the deep generative network.
arXiv Detail & Related papers (2023-10-13T20:03:22Z) - Controllable Neural Symbolic Regression [10.128755371375572]
In symbolic regression, the goal is to find an analytical expression that fits experimental data with the minimal use of mathematical symbols.
We propose a novel neural symbolic regression method, named Neural Symbolic Regression with Hypothesis (NSRwH)
Our experiments demonstrate that the proposed conditioned deep learning model outperforms its unconditioned counterparts in terms of accuracy.
arXiv Detail & Related papers (2023-04-20T14:20:48Z) - Neural-Symbolic Recursive Machine for Systematic Generalization [113.22455566135757]
We introduce the Neural-Symbolic Recursive Machine (NSR), whose core is a Grounded Symbol System (GSS)
NSR integrates neural perception, syntactic parsing, and semantic reasoning.
We evaluate NSR's efficacy across four challenging benchmarks designed to probe systematic generalization capabilities.
arXiv Detail & Related papers (2022-10-04T13:27:38Z) - SLASH: Embracing Probabilistic Circuits into Neural Answer Set
Programming [15.814914345000574]
We introduce SLASH -- a novel deep probabilistic programming language (DPPL)
At its core, SLASH consists of Neural-Probabilistic Predicates (NPPs) and logical programs which are united via answer set programming.
We evaluate SLASH on the benchmark data of MNIST addition as well as novel tasks for DPPLs such as missing data prediction and set prediction with state-of-the-art performance.
arXiv Detail & Related papers (2021-10-07T12:35:55Z) - Sinkhorn Natural Gradient for Generative Models [125.89871274202439]
We propose a novel Sinkhorn Natural Gradient (SiNG) algorithm which acts as a steepest descent method on the probability space endowed with the Sinkhorn divergence.
We show that the Sinkhorn information matrix (SIM), a key component of SiNG, has an explicit expression and can be evaluated accurately in complexity that scales logarithmically.
In our experiments, we quantitatively compare SiNG with state-of-the-art SGD-type solvers on generative tasks to demonstrate its efficiency and efficacy of our method.
arXiv Detail & Related papers (2020-11-09T02:51:17Z) - Partially Conditioned Generative Adversarial Networks [75.08725392017698]
Generative Adversarial Networks (GANs) let one synthesise artificial datasets by implicitly modelling the underlying probability distribution of a real-world training dataset.
With the introduction of Conditional GANs and their variants, these methods were extended to generating samples conditioned on ancillary information available for each sample within the dataset.
In this work, we argue that standard Conditional GANs are not suitable for such a task and propose a new Adversarial Network architecture and training strategy.
arXiv Detail & Related papers (2020-07-06T15:59:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.