Engineering Scientific Assistants using Interactive Structured Induction of Programs
- URL: http://arxiv.org/abs/2503.14488v1
- Date: Tue, 18 Mar 2025 17:57:16 GMT
- Title: Engineering Scientific Assistants using Interactive Structured Induction of Programs
- Authors: Shraddha Surana, Ashwin Srinivasan,
- Abstract summary: This paper focuses on the software-engineering of such'science accelerators'<n>We propose the design of an interactive form of'structured' inductive programming in which a software-engineer and an LLM collaboratively construct an 'assistant' for a scientific data analysis.<n>The results show iStrucInd allows a software engineer to develop better programs faster suggesting interactive structured induction can play a useful role in the rapid construction of scientific assistants.
- Score: 1.2101952480469729
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We are interested in the construction of software that can act as scientific assistants to domain specialists. It is expected that such assistants will be needed to accelerate the identification of ways to address complex problems requiring urgent solutions. In this paper, our focus is not on a specific scientific problem, but on the software-engineering of such 'science accelerators'. Recent developments in 'No Code' techniques would seem to suggest that scientist can simply hypothesise solutions simply by conversing with a large language model (LLM). However, for complex scientific problems, this seems unlikely given the current state of LLM technology. What does appear feasible is that a software engineer can use LLMs to rapidly construct programs for use by a domain-specialist, including the specialist's requirements expressed in natural language. We propose the design of an interactive form of 'structured' inductive programming in which a software-engineer and an LLM collaboratively construct an 'assistant' for a scientific data analysis. The paper describes a simple implementation called iStrucInd that adapts a '2-way Intelligibility' protocol to implement the interaction between the software engineer and the LLM. We test the tool on two different non-trivial scientific data analysis tasks. Specifically, we compare the system constructed by iStrucInd against systems constructed manually and by Low Code/No Code methods along dimensions of: (a) program performance; (b) program quality; and (c) programming effort. The results show iStrucInd allows a software engineer to develop better programs faster suggesting interactive structured induction can play a useful role in the rapid construction of scientific assistants.
Related papers
- Self-Steering Language Models [113.96916935955842]
DisCIPL is a method for "self-steering" language models.
DisCIPL uses a Planner model to generate a task-specific inference program.
Our work opens up a design space of highly-parallelized Monte Carlo inference strategies.
arXiv Detail & Related papers (2025-04-09T17:54:22Z) - Promptware Engineering: Software Engineering for LLM Prompt Development [22.788377588087894]
Large Language Models (LLMs) are increasingly integrated into software applications, with prompts serving as the primary 'programming' interface.<n>As a result, a new software paradigm, promptware, has emerged, using natural language prompts to interact with LLMs.<n>Unlike traditional software, which relies on formal programming languages and deterministic runtime environments, promptware is based on ambiguous, unstructured, and context-dependent natural language.
arXiv Detail & Related papers (2025-03-04T08:43:16Z) - VISION: A Modular AI Assistant for Natural Human-Instrument Interaction at Scientific User Facilities [0.19736111241221438]
generative AI presents an opportunity to bridge this knowledge gap.<n>We present a modular architecture for the Virtual Scientific Companion (VISION)<n>With VISION, we performed LLM-based operation on the beamline workstation with low latency and demonstrated the first voice-controlled experiment at an X-ray scattering beamline.
arXiv Detail & Related papers (2024-12-24T04:37:07Z) - Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System [62.832818186789545]
Virtual Scientists (VirSci) is a multi-agent system designed to mimic the teamwork inherent in scientific research.
VirSci organizes a team of agents to collaboratively generate, evaluate, and refine research ideas.
We show that this multi-agent approach outperforms the state-of-the-art method in producing novel scientific ideas.
arXiv Detail & Related papers (2024-10-12T07:16:22Z) - Meaning-Typed Programming: Language-level Abstractions and Runtime for GenAI Applications [8.308424118055981]
Software is rapidly evolving from logical code to neuro-integrated applications that leverage generative AI and large language models (LLMs) for application functionality.
This paper proposes meaning-typed programming (MTP), a novel approach to simplify the creation of neuro-integrated applications.
arXiv Detail & Related papers (2024-05-14T21:12:01Z) - SciInstruct: a Self-Reflective Instruction Annotated Dataset for Training Scientific Language Models [57.96527452844273]
We introduce SciInstruct, a suite of scientific instructions for training scientific language models capable of college-level scientific reasoning.
We curated a diverse and high-quality dataset encompassing physics, chemistry, math, and formal proofs.
To verify the effectiveness of SciInstruct, we fine-tuned different language models with SciInstruct, i.e., ChatGLM3 (6B and 32B), Llama3-8B-Instruct, and Mistral-7B: MetaMath.
arXiv Detail & Related papers (2024-01-15T20:22:21Z) - When Do Program-of-Thoughts Work for Reasoning? [51.2699797837818]
We propose complexity-impacted reasoning score (CIRS) to measure correlation between code and reasoning abilities.
Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity.
Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
arXiv Detail & Related papers (2023-08-29T17:22:39Z) - A Divide-Align-Conquer Strategy for Program Synthesis [5.7426444823028335]
We show that compositional segmentation can be applied in the programming by examples setting to divide the search for large programs across multiple smaller program synthesis problems.
A structural alignment of the constituent parts in the input and output leads to pairwise correspondences used to guide the program search.
arXiv Detail & Related papers (2023-01-08T19:10:55Z) - Leveraging Language to Learn Program Abstractions and Search Heuristics [66.28391181268645]
We introduce LAPS (Language for Abstraction and Program Search), a technique for using natural language annotations to guide joint learning of libraries and neurally-guided search models for synthesis.
When integrated into a state-of-the-art library learning system (DreamCoder), LAPS produces higher-quality libraries and improves search efficiency and generalization.
arXiv Detail & Related papers (2021-06-18T15:08:47Z) - Technology Readiness Levels for Machine Learning Systems [107.56979560568232]
Development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end.
We have developed a proven systems engineering approach for machine learning development and deployment.
Our "Machine Learning Technology Readiness Levels" framework defines a principled process to ensure robust, reliable, and responsible systems.
arXiv Detail & Related papers (2021-01-11T15:54:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.