Dolphin: A Programmable Framework for Scalable Neurosymbolic Learning
- URL: http://arxiv.org/abs/2410.03348v1
- Date: Fri, 4 Oct 2024 12:12:36 GMT
- Title: Dolphin: A Programmable Framework for Scalable Neurosymbolic Learning
- Authors: Aaditya Naik, Jason Liu, Claire Wang, Saikat Dutta, Mayur Naik, Eric Wong,
- Abstract summary: We propose a framework to scale neurosymbolic learning at a fundamental level by mapping forward chaining and backward gradient propagation in symbolic programs to vectorized computations.
Dolphin introduces a set of abstractions and primitives built directly on top of a high-performance deep learning framework like PyTorch.
We evaluate Dolphin on a suite of 13 benchmarks across 5 neurosymbolic tasks that combine deep learning models for text, image, or video processing with symbolic programs.
- Score: 18.50192747078987
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neurosymbolic learning has emerged as a promising paradigm to incorporate symbolic reasoning into deep learning models. However, existing frameworks are limited in scalability with respect to both the training data and the complexity of symbolic programs. We propose Dolphin, a framework to scale neurosymbolic learning at a fundamental level by mapping both forward chaining and backward gradient propagation in symbolic programs to vectorized computations. For this purpose, Dolphin introduces a set of abstractions and primitives built directly on top of a high-performance deep learning framework like PyTorch, effectively enabling symbolic programs to be written as PyTorch modules. It thereby enables neurosymbolic programs to be written in a language like Python that is familiar to developers and compile them to computation graphs that are amenable to end-to-end differentiation on GPUs. We evaluate Dolphin on a suite of 13 benchmarks across 5 neurosymbolic tasks that combine deep learning models for text, image, or video processing with symbolic programs that involve multi-hop reasoning, recursion, and even black-box functions like Python eval(). Dolphin only takes 0.33%-37.17% of the time (and 2.77% on average) to train these models on the largest input per task compared to baselines Scallop, ISED, and IndeCateR+, which time out on most of these inputs. Models written in Dolphin also achieve state-of-the-art accuracies even on the largest benchmarks.
Related papers
- CyNetDiff -- A Python Library for Accelerated Implementation of Network Diffusion Models [0.9831489366502302]
CyNetDiff is a Python library with components written in Cython to provide improved performance for these computationally intensive diffusion tasks.
In many research tasks, these simulations are the most computationally intensive task, so it would be desirable to have a library for these with an interface to a high-level language.
arXiv Detail & Related papers (2024-04-25T21:59:55Z) - AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code Generation [14.831115535710692]
We propose the concept of AI-oriented grammar.
This aims to represent code in a way that better suits the working mechanism of AI models.
Code written with AI-oriented grammar discards formats and uses a minimum number of tokens.
arXiv Detail & Related papers (2024-04-25T04:46:02Z) - The Role of Foundation Models in Neuro-Symbolic Learning and Reasoning [54.56905063752427]
Neuro-Symbolic AI (NeSy) holds promise to ensure the safe deployment of AI systems.
Existing pipelines that train the neural and symbolic components sequentially require extensive labelling.
New architecture, NeSyGPT, fine-tunes a vision-language foundation model to extract symbolic features from raw data.
arXiv Detail & Related papers (2024-02-02T20:33:14Z) - Catwalk: A Unified Language Model Evaluation Framework for Many Datasets [50.75378592254184]
Catwalk provides a unified interface to a broad range of existing NLP datasets and models.
Catwalk substantially lowers the barriers to conducting controlled experiments at scale.
arXiv Detail & Related papers (2023-12-15T23:11:45Z) - Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning [84.12154024070024]
We propose natural language embedded programs (NLEP) as a unifying framework for addressing math/symbolic reasoning, natural language understanding, and instruction following tasks.
Our approach prompts a language model to generate full Python programs that define functions over data structures which contain natural language representations of structured knowledge.
A Python interpreter then executes the generated code and prints the output.
arXiv Detail & Related papers (2023-09-19T17:54:21Z) - Scallop: A Language for Neurosymbolic Programming [14.148819428748597]
Scallop is a language that combines the benefits of deep learning and logical reasoning.
It is capable of expressing algorithmic reasoning in diverse and challenging AI tasks.
It provides a succinct interface for machine learning programmers to integrate logical domain knowledge.
arXiv Detail & Related papers (2023-04-10T18:46:53Z) - Program of Thoughts Prompting: Disentangling Computation from Reasoning
for Numerical Reasoning Tasks [108.4568236569645]
Chain-of-thoughts prompting (CoT) is by far the state-of-art method for these tasks.
We propose Program of Thoughts' (PoT), which uses language models to express the reasoning process as a program.
PoT can show an average performance gain over CoT by around 12% across all the evaluated datasets.
arXiv Detail & Related papers (2022-11-22T21:06:00Z) - Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning
Programs [7.656446581986389]
Imperative programming allows users to implement their deep neural networks (DNNs) easily.
Several systems have been proposed to combine the usability of imperative programming with the optimized performance of symbolic graph execution.
We propose Terra, an imperative-symbolic co-execution system that can handle any imperative DL programs while achieving the optimized performance of symbolic graph execution.
arXiv Detail & Related papers (2022-01-23T09:04:48Z) - Program Synthesis with Large Language Models [40.41120807053989]
We evaluate large language models for program synthesis in Python.
We find that synthesis performance scales log-linearly with model size.
We find that even our best models are generally unable to predict the output of a program given a specific input.
arXiv Detail & Related papers (2021-08-16T03:57:30Z) - Representing Partial Programs with Blended Abstract Semantics [62.20775388513027]
We introduce a technique for representing partially written programs in a program synthesis engine.
We learn an approximate execution model implemented as a modular neural network.
We show that these hybrid neuro-symbolic representations enable execution-guided synthesizers to use more powerful language constructs.
arXiv Detail & Related papers (2020-12-23T20:40:18Z) - OPFython: A Python-Inspired Optimum-Path Forest Classifier [68.8204255655161]
This paper proposes a Python-based Optimum-Path Forest framework, denoted as OPFython.
As OPFython is a Python-based library, it provides a more friendly environment and a faster prototyping workspace than the C language.
arXiv Detail & Related papers (2020-01-28T15:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.