Dolphin: A Programmable Framework for Scalable Neurosymbolic Learning
- URL: http://arxiv.org/abs/2410.03348v1
- Date: Fri, 4 Oct 2024 12:12:36 GMT
- Title: Dolphin: A Programmable Framework for Scalable Neurosymbolic Learning
- Authors: Aaditya Naik, Jason Liu, Claire Wang, Saikat Dutta, Mayur Naik, Eric Wong,
- Abstract summary: We propose a framework to scale neurosymbolic learning at a fundamental level by mapping forward chaining and backward gradient propagation in symbolic programs to vectorized computations.
Dolphin introduces a set of abstractions and primitives built directly on top of a high-performance deep learning framework like PyTorch.
We evaluate Dolphin on a suite of 13 benchmarks across 5 neurosymbolic tasks that combine deep learning models for text, image, or video processing with symbolic programs.
- Score: 18.50192747078987
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neurosymbolic learning has emerged as a promising paradigm to incorporate symbolic reasoning into deep learning models. However, existing frameworks are limited in scalability with respect to both the training data and the complexity of symbolic programs. We propose Dolphin, a framework to scale neurosymbolic learning at a fundamental level by mapping both forward chaining and backward gradient propagation in symbolic programs to vectorized computations. For this purpose, Dolphin introduces a set of abstractions and primitives built directly on top of a high-performance deep learning framework like PyTorch, effectively enabling symbolic programs to be written as PyTorch modules. It thereby enables neurosymbolic programs to be written in a language like Python that is familiar to developers and compile them to computation graphs that are amenable to end-to-end differentiation on GPUs. We evaluate Dolphin on a suite of 13 benchmarks across 5 neurosymbolic tasks that combine deep learning models for text, image, or video processing with symbolic programs that involve multi-hop reasoning, recursion, and even black-box functions like Python eval(). Dolphin only takes 0.33%-37.17% of the time (and 2.77% on average) to train these models on the largest input per task compared to baselines Scallop, ISED, and IndeCateR+, which time out on most of these inputs. Models written in Dolphin also achieve state-of-the-art accuracies even on the largest benchmarks.
Related papers
- Lobster: A GPU-Accelerated Framework for Neurosymbolic Programming [10.129565359989044]
Neurosymbolic programs combine deep learning with symbolic reasoning to achieve better data efficiency, interpretability, and generalizability.
Existing neurosymbolic learning frameworks implement an uneasy marriage between a highly scalable, GPU-accelerated neural component with a slower symbolic component that runs on CPUs.
We propose Lobster, a unified framework for harnessing GPU in an end-to-end manner for neurosymbolic learning.
arXiv Detail & Related papers (2025-03-27T19:32:58Z) - Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback [69.57617563853822]
Dolphin is a framework to enhance the automation level of scientific research.
Dolphin first generates novel ideas based on feedback from previous experiments.
Dolphin automatically analyzes the results of each idea and feeds the results back to the next round of idea generation.
arXiv Detail & Related papers (2025-01-07T16:31:10Z) - Compositional Generalization Across Distributional Shifts with Sparse Tree Operations [77.5742801509364]
We introduce a unified neurosymbolic architecture called the Differentiable Tree Machine.
We significantly increase the model's efficiency through the use of sparse vector representations of symbolic structures.
We enable its application beyond the restricted set of tree2tree problems to the more general class of seq2seq problems.
arXiv Detail & Related papers (2024-12-18T17:20:19Z) - Discovering physical laws with parallel combinatorial tree search [57.05912962368898]
Symbolic regression plays a crucial role in scientific research thanks to its capability of discovering concise and interpretable mathematical expressions from data.
Existing algorithms have faced a critical bottleneck of accuracy and efficiency over a decade.
We introduce a parallel tree search (PCTS) model to efficiently distill generic mathematical expressions from limited data.
arXiv Detail & Related papers (2024-07-05T10:41:15Z) - Deep Symbolic Optimization for Combinatorial Optimization: Accelerating Node Selection by Discovering Potential Heuristics [10.22111332588471]
We propose a novel deep symbolic optimization learning framework that combines their advantages.
Dso4NS guides the search for mathematical expressions within the high-dimensional discrete symbolic space and then incorporates the highest-performing mathematical expressions into a solver.
Experiments demonstrate the effectiveness of Dso4NS in learning high-quality expressions, outperforming existing approaches on a CPU machine.
arXiv Detail & Related papers (2024-06-14T06:02:14Z) - CyNetDiff -- A Python Library for Accelerated Implementation of Network Diffusion Models [0.9831489366502302]
CyNetDiff is a Python library with components written in Cython to provide improved performance for these computationally intensive diffusion tasks.
In many research tasks, these simulations are the most computationally intensive task, so it would be desirable to have a library for these with an interface to a high-level language.
arXiv Detail & Related papers (2024-04-25T21:59:55Z) - AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code Generation [14.831115535710692]
We propose the concept of AI-oriented grammar.
This aims to represent code in a way that better suits the working mechanism of AI models.
Code written with AI-oriented grammar discards formats and uses a minimum number of tokens.
arXiv Detail & Related papers (2024-04-25T04:46:02Z) - The Role of Foundation Models in Neuro-Symbolic Learning and Reasoning [54.56905063752427]
Neuro-Symbolic AI (NeSy) holds promise to ensure the safe deployment of AI systems.
Existing pipelines that train the neural and symbolic components sequentially require extensive labelling.
New architecture, NeSyGPT, fine-tunes a vision-language foundation model to extract symbolic features from raw data.
arXiv Detail & Related papers (2024-02-02T20:33:14Z) - Catwalk: A Unified Language Model Evaluation Framework for Many Datasets [50.75378592254184]
Catwalk provides a unified interface to a broad range of existing NLP datasets and models.
Catwalk substantially lowers the barriers to conducting controlled experiments at scale.
arXiv Detail & Related papers (2023-12-15T23:11:45Z) - Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning [84.12154024070024]
We propose natural language embedded programs (NLEP) as a unifying framework for addressing math/symbolic reasoning, natural language understanding, and instruction following tasks.
Our approach prompts a language model to generate full Python programs that define functions over data structures which contain natural language representations of structured knowledge.
A Python interpreter then executes the generated code and prints the output.
arXiv Detail & Related papers (2023-09-19T17:54:21Z) - Scallop: A Language for Neurosymbolic Programming [14.148819428748597]
Scallop is a language that combines the benefits of deep learning and logical reasoning.
It is capable of expressing algorithmic reasoning in diverse and challenging AI tasks.
It provides a succinct interface for machine learning programmers to integrate logical domain knowledge.
arXiv Detail & Related papers (2023-04-10T18:46:53Z) - Symbolic Visual Reinforcement Learning: A Scalable Framework with
Object-Level Abstraction and Differentiable Expression Search [63.3745291252038]
We propose DiffSES, a novel symbolic learning approach that discovers discrete symbolic policies.
By using object-level abstractions instead of raw pixel-level inputs, DiffSES is able to leverage the simplicity and scalability advantages of symbolic expressions.
Our experiments demonstrate that DiffSES is able to generate symbolic policies that are simpler and more scalable than state-of-the-art symbolic RL methods.
arXiv Detail & Related papers (2022-12-30T17:50:54Z) - Program of Thoughts Prompting: Disentangling Computation from Reasoning
for Numerical Reasoning Tasks [108.4568236569645]
Chain-of-thoughts prompting (CoT) is by far the state-of-art method for these tasks.
We propose Program of Thoughts' (PoT), which uses language models to express the reasoning process as a program.
PoT can show an average performance gain over CoT by around 12% across all the evaluated datasets.
arXiv Detail & Related papers (2022-11-22T21:06:00Z) - NAPG: Non-Autoregressive Program Generation for Hybrid Tabular-Textual
Question Answering [52.10214317661547]
Current numerical reasoning methods autoregressively decode program sequences.
The accuracy of program generation drops sharply as the decoding steps unfold due to error propagation.
In this paper, we propose a non-autoregressive program generation framework.
arXiv Detail & Related papers (2022-11-07T11:25:21Z) - Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning
Programs [7.656446581986389]
Imperative programming allows users to implement their deep neural networks (DNNs) easily.
Several systems have been proposed to combine the usability of imperative programming with the optimized performance of symbolic graph execution.
We propose Terra, an imperative-symbolic co-execution system that can handle any imperative DL programs while achieving the optimized performance of symbolic graph execution.
arXiv Detail & Related papers (2022-01-23T09:04:48Z) - Program Synthesis with Large Language Models [40.41120807053989]
We evaluate large language models for program synthesis in Python.
We find that synthesis performance scales log-linearly with model size.
We find that even our best models are generally unable to predict the output of a program given a specific input.
arXiv Detail & Related papers (2021-08-16T03:57:30Z) - Representing Partial Programs with Blended Abstract Semantics [62.20775388513027]
We introduce a technique for representing partially written programs in a program synthesis engine.
We learn an approximate execution model implemented as a modular neural network.
We show that these hybrid neuro-symbolic representations enable execution-guided synthesizers to use more powerful language constructs.
arXiv Detail & Related papers (2020-12-23T20:40:18Z) - OPFython: A Python-Inspired Optimum-Path Forest Classifier [68.8204255655161]
This paper proposes a Python-based Optimum-Path Forest framework, denoted as OPFython.
As OPFython is a Python-based library, it provides a more friendly environment and a faster prototyping workspace than the C language.
arXiv Detail & Related papers (2020-01-28T15:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.