CLAIMED -- the open source framework for building coarse-grained
operators for accelerated discovery in science
- URL: http://arxiv.org/abs/2307.06824v1
- Date: Wed, 12 Jul 2023 11:54:39 GMT
- Title: CLAIMED -- the open source framework for building coarse-grained
operators for accelerated discovery in science
- Authors: Romeo Kienzler, Rafflesia Khan, Jerome Nilmeier, Ivan Nesic, Ibrahim
Haddad
- Abstract summary: CLAIMED is a framework to build reusable operators and scalable scientific agnostic by supporting the scientist to draw from previous work by re-composing scientific operators.
CLAIMED is programming language, scientific library, and execution environment.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In modern data-driven science, reproducibility and reusability are key
challenges. Scientists are well skilled in the process from data to
publication. Although some publication channels require source code and data to
be made accessible, rerunning and verifying experiments is usually hard due to
a lack of standards. Therefore, reusing existing scientific data processing
code from state-of-the-art research is hard as well. This is why we introduce
CLAIMED, which has a proven track record in scientific research for addressing
the repeatability and reusability issues in modern data-driven science. CLAIMED
is a framework to build reusable operators and scalable scientific workflows by
supporting the scientist to draw from previous work by re-composing workflows
from existing libraries of coarse-grained scientific operators. Although
various implementations exist, CLAIMED is programming language, scientific
library, and execution environment agnostic.
Related papers
- Exploring Code Comprehension in Scientific Programming: Preliminary Insights from Research Scientists [6.2329239454115415]
This study surveys 57 research scientists from various disciplines to explore their programming backgrounds, practices, and challenges they face regarding code readability.
Scientists mainly use Python and R, relying on documentation for readability.
Our findings show low adoption of code quality tools and a trend towards utilizing large language models to improve code quality.
arXiv Detail & Related papers (2025-01-17T08:47:29Z) - Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System [62.832818186789545]
Virtual Scientists (VirSci) is a multi-agent system designed to mimic the teamwork inherent in scientific research.
VirSci organizes a team of agents to collaboratively generate, evaluate, and refine research ideas.
We show that this multi-agent approach outperforms the state-of-the-art method in producing novel scientific ideas.
arXiv Detail & Related papers (2024-10-12T07:16:22Z) - A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery [68.48094108571432]
Large language models (LLMs) have revolutionized the way text and other modalities of data are handled.
We aim to provide a more holistic view of the research landscape by unveiling cross-field and cross-modal connections between scientific LLMs.
arXiv Detail & Related papers (2024-06-16T08:03:24Z) - MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows [58.56005277371235]
We introduce MASSW, a comprehensive text dataset on Multi-Aspect Summarization of ScientificAspects.
MASSW includes more than 152,000 peer-reviewed publications from 17 leading computer science conferences spanning the past 50 years.
We demonstrate the utility of MASSW through multiple novel machine-learning tasks that can be benchmarked using this new dataset.
arXiv Detail & Related papers (2024-06-10T15:19:09Z) - DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows [72.40917624485822]
We introduce DataDreamer, an open source Python library that allows researchers to implement powerful large language models.
DataDreamer also helps researchers adhere to best practices that we propose to encourage open science.
arXiv Detail & Related papers (2024-02-16T00:10:26Z) - SciCat: A Curated Dataset of Scientific Software Repositories [4.77982299447395]
We introduce the SciCat dataset -- a comprehensive collection of Free-Libre Open Source Software (FLOSS) projects.
Our approach involves selecting projects from a pool of 131 million deforked repositories from the World of Code data source.
Our classification focuses on software designed for scientific purposes, research-related projects, and research support software.
arXiv Detail & Related papers (2023-12-11T13:46:33Z) - Caching and Reproducibility: Making Data Science experiments faster and
FAIRer [25.91002326340444]
Small to medium-scale data science experiments often rely on research software developed ad-hoc by individual scientists or small teams.
We suggest making caching an integral part of the research software development process, even before the first line of code is written.
arXiv Detail & Related papers (2022-11-08T07:11:02Z) - KnowledgeShovel: An AI-in-the-Loop Document Annotation System for
Scientific Knowledge Base Construction [46.56643271476249]
KnowledgeShovel is an Al-in-the-Loop document annotation system for researchers to construct scientific knowledge bases.
The design of KnowledgeShovel introduces a multi-step multi-modalAI collaboration pipeline to improve data accuracy while reducing the human burden.
A follow-up user evaluation with 7 geoscience researchers shows that KnowledgeShovel can enable efficient construction of scientific knowledge bases with satisfactory accuracy.
arXiv Detail & Related papers (2022-10-06T11:38:18Z) - Automated Creation and Human-assisted Curation of Computable Scientific
Models from Code and Text [2.3746609573239756]
Domain experts cannot gain a complete understanding of the implementation of a scientific model if they are not familiar with the code.
We develop a system for the automated creation and human-assisted curation of scientific models.
We present experimental results obtained using a dataset of code and associated text derived from NASA's Hypersonic Aerodynamics website.
arXiv Detail & Related papers (2022-01-28T17:31:38Z) - Fact or Fiction: Verifying Scientific Claims [53.29101835904273]
We introduce scientific claim verification, a new task to select abstracts from the research literature containing evidence that SUPPORTS or REFUTES a given scientific claim.
We construct SciFact, a dataset of 1.4K expert-written scientific claims paired with evidence-containing abstracts annotated with labels and rationales.
We show that our system is able to verify claims related to COVID-19 by identifying evidence from the CORD-19 corpus.
arXiv Detail & Related papers (2020-04-30T17:22:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.