Related papers: SIERRA: A Modular Framework for Research Automation and Reproducibility

SIERRA: A Modular Framework for Research Automation and Reproducibility

URL: http://arxiv.org/abs/2208.07805v1
Date: Tue, 16 Aug 2022 15:36:34 GMT
Title: SIERRA: A Modular Framework for Research Automation and Reproducibility
Authors: John Harwell, Maria Gini
Abstract summary: We present SIERRA, a novel framework for accelerating research development and improving results. SIERRA accelerates research by automating the process of generating executable experiments from queries over independent variables. It employs a modular architecture enabling easy customization and extension for the needs of individual researchers.
Score: 6.1678491628787455
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Modern intelligent systems researchers form hypotheses about system behavior and then run experiments using one or more independent variables to test their hypotheses. We present SIERRA, a novel framework structured around that idea for accelerating research development and improving reproducibility of results. SIERRA accelerates research by automating the process of generating executable experiments from queries over independent variables(s), executing experiments, and processing the results to generate deliverables such as graphs and videos. It shifts the paradigm for testing hypotheses from procedural ("Do these steps to answer the query") to declarative ("Here is the query to test--GO!"), reducing the burden on researchers. It employs a modular architecture enabling easy customization and extension for the needs of individual researchers, thereby eliminating manual configuration and processing via throw-away scripts. SIERRA improves reproducibility of research by providing automation independent of the execution environment (HPC hardware, real robots, etc.) and targeted platform (arbitrary simulator or real robots). This enables exact experiment replication, up to the limit of the execution environment and platform, as well as making it easy for researchers to test hypotheses in different computational environments.

Related papers

CodeScientist: End-to-End Semi-Automated Scientific Discovery with Code-based Experimentation [48.12054700748627]
We introduce CodeScientist, a novel ASD system that frames ideation and experiment construction as a form of genetic search jointly. We use this paradigm to conduct hundreds of automated experiments on machine-generated ideas broadly in the domain of agents and virtual environments.
arXiv Detail & Related papers (2025-03-20T22:37:17Z)
Commit0: Library Generation from Scratch [77.38414688148006]
Commit0 is a benchmark that challenges AI agents to write libraries from scratch. Agents are provided with a specification document outlining the library's API as well as a suite of interactive unit tests. Commit0 also offers an interactive environment where models receive static analysis and execution feedback on the code they generate.
arXiv Detail & Related papers (2024-12-02T18:11:30Z)
MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents [10.86017322488788]
We present a new systematic framework, autonomous Machine Learning Research with large language models (MLR-Copilot) It is designed to enhance machine learning research productivity through the automatic generation and implementation of research ideas using Large Language Model (LLM) agents. We evaluate our framework on five machine learning research tasks and the experimental results show the framework's potential to facilitate the research progress and innovations.
arXiv Detail & Related papers (2024-08-26T05:55:48Z)
Automatic benchmarking of large multimodal models via iterative experiment programming [71.78089106671581]
We present APEx, the first framework for automatic benchmarking of LMMs. Given a research question expressed in natural language, APEx leverages a large language model (LLM) and a library of pre-specified tools to generate a set of experiments for the model at hand. The report drives the testing procedure: based on the current status of the investigation, APEx chooses which experiments to perform and whether the results are sufficient to draw conclusions.
arXiv Detail & Related papers (2024-06-18T06:43:46Z)
System for systematic literature review using multiple AI agents: Concept and an empirical evaluation [5.194208843843004]
We introduce a novel multi-AI agent model designed to fully automate the process of conducting Systematic Literature Reviews. The model operates through a user-friendly interface where researchers input their topic. It generates a search string used to retrieve relevant academic papers. The model then autonomously summarizes the abstracts of these papers.
arXiv Detail & Related papers (2024-03-13T10:27:52Z)
MLXP: A Framework for Conducting Replicable Experiments in Python [63.37350735954699]
We propose MLXP, an open-source, simple, and lightweight experiment management tool based on Python. It streamlines the experimental process with minimal overhead while ensuring a high level of practitioner overhead.
arXiv Detail & Related papers (2024-02-21T14:22:20Z)
A Backend Platform for Supporting the Reproducibility of Computational Experiments [2.1485350418225244]
It is challenging to recreate the same environment using the same frameworks, code, data sources, programming languages, dependencies, and so on. In this work, we propose an Integrated Development Environment allowing the share, configuration, packaging and execution of an experiment. We have been able to successfully reproduce 20 (80%) of these experiments achieving the results reported in such works with minimum effort.
arXiv Detail & Related papers (2023-06-29T10:29:11Z)
SIERRA: A Modular Framework for Research Automation [5.220940151628734]
We present SIERRA, a novel framework for accelerating research developments and improving results. SIERRA makes it easy to quickly specify the independent variable(s) for an experiment, generate experimental inputs, automatically run the experiment, and process the results to generate deliverables such as graphs and videos. It employs a deeply modular approach that allows easy customization and extension of automation for the needs of individual researchers.
arXiv Detail & Related papers (2022-03-03T23:45:46Z)
A User's Guide to Calibrating Robotics Simulators [54.85241102329546]
This paper proposes a set of benchmarks and a framework for the study of various algorithms aimed to transfer models and policies learnt in simulation to the real world. We conduct experiments on a wide range of well known simulated environments to characterize and offer insights into the performance of different algorithms. Our analysis can be useful for practitioners working in this area and can help make informed choices about the behavior and main properties of sim-to-real algorithms.
arXiv Detail & Related papers (2020-11-17T22:24:26Z)
Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration [130.89746032163106]
We propose ALOE, a new algorithm for learning conditional and unconditional EBMs for discrete structured data. We show that the energy function and sampler can be trained efficiently via a new variational form of power iteration. We present an energy model guided fuzzer for software testing that achieves comparable performance to well engineered fuzzing engines like libfuzzer.
arXiv Detail & Related papers (2020-11-10T19:31:29Z)
Rearrangement: A Challenge for Embodied AI [229.8891614821016]
We describe a framework for research and evaluation in Embodied AI. Our proposal is based on a canonical task: Rearrangement. We present experimental testbeds of rearrangement scenarios in four different simulation environments.
arXiv Detail & Related papers (2020-11-03T19:42:32Z)
Integrated Benchmarking and Design for Reproducible and Accessible Evaluation of Robotic Agents [61.36681529571202]
We describe a new concept for reproducible robotics research that integrates development and benchmarking. One of the central components of this setup is the Duckietown Autolab, a standardized setup that is itself relatively low-cost and reproducible. We validate the system by analyzing the repeatability of experiments conducted using the infrastructure and show that there is low variance across different robot hardware and across different remote labs.
arXiv Detail & Related papers (2020-09-09T15:31:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.