GlassMol: Interpretable Molecular Property Prediction with Concept Bottleneck Models
- URL: http://arxiv.org/abs/2603.01274v1
- Date: Sun, 01 Mar 2026 21:07:49 GMT
- Title: GlassMol: Interpretable Molecular Property Prediction with Concept Bottleneck Models
- Authors: Oscar Rivera, Ziqing Wang, Matthieu Dagommer, Abhishek Pandey, Kaize Ding,
- Abstract summary: In drug discovery, where safety is critical, machine learning models operate as black boxes.<n>Existing interpretability methods suffer from the effectiveness-trustworthiness trade-off.<n>We introduce GlassMol, a model-agnostic CBM that addresses these gaps through automated concept curation and LLM-guided concept selection.
- Score: 26.551184488481912
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning accelerates molecular property prediction, yet state-of-the-art Large Language Models and Graph Neural Networks operate as black boxes. In drug discovery, where safety is critical, this opacity risks masking false correlations and excluding human expertise. Existing interpretability methods suffer from the effectiveness-trustworthiness trade-off: explanations may fail to reflect a model's true reasoning, degrade performance, or lack domain grounding. Concept Bottleneck Models (CBMs) offer a solution by projecting inputs to human-interpretable concepts before readout, ensuring that explanations are inherently faithful to the decision process. However, adapting CBMs to chemistry faces three challenges: the Relevance Gap (selecting task-relevant concepts from a large descriptor space), the Annotation Gap (obtaining concept supervision for molecular data), and the Capacity Gap (degrading performance due to bottleneck constraints). We introduce GlassMol, a model-agnostic CBM that addresses these gaps through automated concept curation and LLM-guided concept selection. Experiments across thirteen benchmarks demonstrate that \method generally matches or exceeds black-box baselines, suggesting that interpretability does not sacrifice performance and challenging the commonly assumed trade-off. Code is available at https://github.com/walleio/GlassMol.
Related papers
- Clarity: The Flexibility-Interpretability Trade-Off in Sparsity-aware Concept Bottleneck Models [12.322360020814516]
Vision-Language Models (VLMs) are often treated as black-boxes, with limited or non-existent investigation of their decision making process.<n>We introduce the notion of clarity, a measure, capturing the interplay between the downstream performance and the sparsity and precision of the concept representation.<n>Our experiments reveal a critical trade-off between flexibility and interpretability, under which a given method can exhibit markedly different behaviors even at comparable performance levels.
arXiv Detail & Related papers (2026-01-29T16:28:55Z) - Concept Component Analysis: A Principled Approach for Concept Extraction in LLMs [51.378834857406325]
Mechanistic interpretability seeks to mitigate the issues through extracts from large language models.<n>Sparse autoencoders (SAEs) have emerged as a popular approach for extracting interpretable and monosemantic concepts.<n>We show that SAEs suffer from a fundamental theoretical ambiguity: the well-defined correspondence between LLM representations and human-interpretable concepts remains unclear.
arXiv Detail & Related papers (2026-01-28T09:27:05Z) - Controllable Concept Bottleneck Models [55.03639763625018]
Controllable Concept Bottleneck Models (CCBMs)<n>CCBMs support three granularities of model editing: concept-label-level, concept-level, and data-level.<n>CCBMs enjoy mathematically rigorous closed-form approximations derived from influence functions that obviate the need for retraining.
arXiv Detail & Related papers (2026-01-01T19:30:06Z) - AUVIC: Adversarial Unlearning of Visual Concepts for Multi-modal Large Language Models [63.05306474002547]
Regulatory frameworks mandating the 'right to be forgotten' drive the need for machine unlearning.<n>We introduce AUVIC, a novel visual concept unlearning framework for MLLMs.<n>We show that AUVIC achieves state-of-the-art target forgetting rates while incurs minimal performance degradation on non-target concepts.
arXiv Detail & Related papers (2025-11-14T13:35:32Z) - LTD-Bench: Evaluating Large Language Models by Letting Them Draw [57.237152905238084]
LTD-Bench is a breakthrough benchmark for large language models (LLMs)<n>It transforms LLM evaluation from abstract scores to directly observable visual outputs by requiring models to generate drawings through dot matrices or executable code.<n> LTD-Bench's visual outputs enable powerful diagnostic analysis, offering a potential approach to investigate model similarity.
arXiv Detail & Related papers (2025-11-04T08:11:23Z) - Towards more holistic interpretability: A lightweight disentangled Concept Bottleneck Model [5.700536552863068]
Concept Bottleneck Models (CBMs) enhance interpretability by predicting human-understandable concepts as intermediate representations.<n>We propose a lightweight Disentangled Concept Bottleneck Model (LDCBM) that automatically groups visual features into semantically meaningful components.<n> Experiments on three diverse datasets demonstrate that LDCBM achieves higher concept and class accuracy, outperforming previous CBMs in both interpretability and classification performance.
arXiv Detail & Related papers (2025-10-17T15:59:30Z) - I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? [76.15163242945813]
Large language models (LLMs) have led many to conclude that they exhibit a form of intelligence.<n>We introduce a novel generative model that generates tokens on the basis of human-interpretable concepts represented as latent discrete variables.
arXiv Detail & Related papers (2025-03-12T01:21:17Z) - Linearly-Interpretable Concept Embedding Models for Text Analysis [9.340843984411137]
We propose a novel Linearly Interpretable Concept Embedding Model (LICEM)<n>LICEMs classification accuracy is better than existing interpretable models and matches black-box ones.<n>We show that the explanations provided by our models are more interveneable and causally consistent with respect to existing solutions.
arXiv Detail & Related papers (2024-06-20T14:04:53Z) - Interpretable Prognostics with Concept Bottleneck Models [5.939858158928473]
Concept Bottleneck Models (CBMs) are inherently interpretable neural network architectures based on concept explanations.
CBMs enable domain experts to intervene on the concept activations at test-time.
Our case studies demonstrate that the performance of CBMs can be on par or superior to black-box models.
arXiv Detail & Related papers (2024-05-27T18:15:40Z) - Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable? [8.391254800873599]
We introduce a method to perform concept-based interventions on pretrained neural networks, which are not interpretable by design.
We formalise the notion of intervenability as a measure of the effectiveness of concept-based interventions and leverage this definition to fine-tune black boxes.
arXiv Detail & Related papers (2024-01-24T16:02:14Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.