TinyChemVL: Advancing Chemical Vision-Language Models via Efficient Visual Token Reduction and Complex Reaction Tasks
- URL: http://arxiv.org/abs/2511.06283v1
- Date: Sun, 09 Nov 2025 08:37:18 GMT
- Title: TinyChemVL: Advancing Chemical Vision-Language Models via Efficient Visual Token Reduction and Complex Reaction Tasks
- Authors: Xuanle Zhao, Shuxin Zeng, Yinyuan Cai, Xiang Cheng, Duzhen Zhang, Xiuyi Chen, Bo Xu,
- Abstract summary: This work builds efficient yet powerful Vision Language Models (VLMs) for chemical domains by co-designing model architecture and task complexity.<n>With only 4B parameters, TinyChemVL achieves superior performance on both molecular and reaction tasks while demonstrating faster inference and training speeds compared to existing models.
- Score: 25.14617060799698
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: While Vision Language Models (VLMs) have demonstrated remarkable capabilities in general visual understanding, their application in the chemical domain has been limited, with previous works predominantly focusing on text and thus overlooking critical visual information, such as molecular structures. Current approaches that directly adopt standard VLMs for chemical tasks suffer from two primary issues: (i) computational inefficiency of processing entire chemical images with non-informative backgrounds. (ii) a narrow scope on molecular-level tasks that restricts progress in chemical reasoning. In this work, we propose \textbf{TinyChemVL}, an efficient and powerful chemical VLM that leverages visual token reduction and reaction-level tasks to improve model efficiency and reasoning capacity. Also, we propose \textbf{ChemRxn-V}, a reaction-level benchmark for assessing vision-based reaction recognition and prediction tasks. Directly predicting reaction products from molecular images poses a non-trivial challenge, as it requires models to integrate both recognition and reasoning capacities. Our results demonstrate that with only 4B parameters, TinyChemVL achieves superior performance on both molecular and reaction tasks while demonstrating faster inference and training speeds compared to existing models. Notably, TinyChemVL outperforms ChemVLM while utilizing only 1/16th of the visual tokens. This work builds efficient yet powerful VLMs for chemical domains by co-designing model architecture and task complexity.
Related papers
- Agentic reinforcement learning empowers next-generation chemical language models for molecular design and synthesis [51.83339196548892]
ChemCraft is a novel framework that decouples chemical reasoning from knowledge storage.<n>ChemCraft achieves superior performance with minimal inference costs.<n>This work establishes a cost-effective and privacy-preserving paradigm for AI-aided chemistry.
arXiv Detail & Related papers (2026-01-25T04:23:34Z) - ChemBART: A Pre-trained BART Model Assisting Organic Chemistry Analysis [9.010003142738338]
ChemBART is a SMILES-based large language model pre-trained on chemical reactions.<n>ChemBART effectively solves a variety of chemical problems, including precursor/reagent generation, temperature-yield regression, molecular property classification, and optimizing the policy and value functions.<n>Our work validates the power of reaction-focused pre-training and showcases the broad utility of ChemBART in advancing the complete synthesis planning cycle.
arXiv Detail & Related papers (2026-01-06T10:55:38Z) - ChemVTS-Bench: Evaluating Visual-Textual-Symbolic Reasoning of Multimodal Large Language Models in Chemistry [14.083820970280668]
ChemVTS-Bench is a domain-authentic benchmark designed to evaluate the Visual-Textual-Symbolic (VTS) reasoning abilities of Multimodal Large Language Models (MLLMs)<n>ChemVTS-Bench contains diverse and challenging chemical problems spanning organic molecules, inorganic materials, and 3D crystal structures.<n>We develop an automated agent-based workflow that standardizes inference, verifies answers, and diagnoses failure modes.
arXiv Detail & Related papers (2025-11-22T04:24:24Z) - RxnCaption: Reformulating Reaction Diagram Parsing as Visual Prompt Guided Captioning [51.393018266721576]
We propose the RxnCaption framework for the task of chemical Reaction Diagram Parsing (RxnDP)<n>Our framework reformulates the traditional coordinate prediction driven parsing process into an image captioning problem.<n>We introduce a strategy termed "BBox and Index as Visual Prompt" (BIVP), which uses our state-of-the-art molecular detector, MolYOLO, to pre-draw molecular bounding boxes and indices directly onto the input image.
arXiv Detail & Related papers (2025-11-04T09:08:44Z) - Atom-anchored LLMs speak Chemistry: A Retrosynthesis Demonstration [2.9496795797433073]
We introduce a framework for molecular reasoning using general-purpose Large Language Models.<n>Our method anchors chain-of-thought reasoning to the molecular structure by using unique atomic identifiers.<n>Our work also provides a method to generate theoretically grounded synthetic datasets.
arXiv Detail & Related papers (2025-10-18T17:27:44Z) - ChemOrch: Empowering LLMs with Chemical Intelligence via Synthetic Instructions [52.79349601462865]
ChemOrch is a framework that synthesizes chemically grounded instruction-response pairs.<n>ChemOrch enables controllable diversity and levels of difficulty for the generated tasks.
arXiv Detail & Related papers (2025-09-20T05:43:58Z) - $\ ext{M}^{2}$LLM: Multi-view Molecular Representation Learning with Large Language Models [59.125833618091846]
We propose a multi-view framework that integrates three perspectives: the molecular structure view, the molecular task view, and the molecular rules view.<n>Experiments demonstrate that $textM2$LLM achieves state-of-the-art performance on multiple benchmarks across classification and regression tasks.
arXiv Detail & Related papers (2025-08-12T05:46:47Z) - ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge [14.6026550444088]
This work focuses on the specific field of chemistry and develop a Chemical Reasoner LLM, ChemDFM-R.<n>We first construct a comprehensive dataset of atomized knowledge points to enhance the model's understanding of the fundamental principles and logical structure of chemistry.<n> Experiments on diverse chemical benchmarks demonstrate that ChemDFM-R achieves cutting-edge performance while providing interpretable, rationale-driven outputs.
arXiv Detail & Related papers (2025-07-29T16:40:49Z) - ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated Data [53.78763789036172]
We present ChemActor, a fully fine-tuned large language model (LLM) as a chemical executor to convert between unstructured experimental procedures and structured action sequences.<n>This framework integrates a data selection module that selects data based on distribution divergence, with a general-purpose LLM, to generate machine-executable actions from a single molecule input.<n>Experiments on reaction-to-description (R2D) and description-to-action (D2A) tasks demonstrate that ChemActor achieves state-of-the-art performance, outperforming the baseline model by 10%.
arXiv Detail & Related papers (2025-06-30T05:11:19Z) - Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations [43.623140005091535]
We introduce ChemCoTBench, a reasoning framework that bridges molecular structure understanding with arithmetic-inspired operations.<n>ChemCoTBench formalizes chemical problem-solving into transparent, step-by-step reasoning.<n>We evaluate models on two high-impact tasks: Molecular Property Optimization and Chemical Reaction Prediction.
arXiv Detail & Related papers (2025-05-27T15:15:44Z) - ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area [70.66610054938052]
We introduce textbfChemVLM, an open-source chemical multimodal large language model for chemical applications.<n>ChemVLM is trained on a carefully curated bilingual dataset that enhances its ability to understand both textual and visual chemical information.<n>We benchmark ChemVLM against a range of open-source and proprietary multimodal large language models on various tasks.
arXiv Detail & Related papers (2024-08-14T01:16:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.