Does AI for science need another ImageNet Or totally different
benchmarks? A case study of machine learning force fields
- URL: http://arxiv.org/abs/2308.05999v1
- Date: Fri, 11 Aug 2023 08:06:58 GMT
- Title: Does AI for science need another ImageNet Or totally different
benchmarks? A case study of machine learning force fields
- Authors: Yatao Li, Wanling Gao, Lei Wang, Lixin Sun, Zun Wang, Jianfeng Zhan
- Abstract summary: AI for science (AI4S) aims to enhance the accuracy and speed of scientific computing tasks using machine learning methods.
Traditional AI benchmarking methods struggle to adapt to the unique challenges posed by AI4S because they assume data in training, testing, and future real-world queries are independent and identically distributed.
This paper investigates the need for a novel approach to effectively benchmark AI for science, using the machine learning force field (MLFF) as a case study.
- Score: 5.622820801789953
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: AI for science (AI4S) is an emerging research field that aims to enhance the
accuracy and speed of scientific computing tasks using machine learning
methods. Traditional AI benchmarking methods struggle to adapt to the unique
challenges posed by AI4S because they assume data in training, testing, and
future real-world queries are independent and identically distributed, while
AI4S workloads anticipate out-of-distribution problem instances. This paper
investigates the need for a novel approach to effectively benchmark AI for
science, using the machine learning force field (MLFF) as a case study. MLFF is
a method to accelerate molecular dynamics (MD) simulation with low
computational cost and high accuracy. We identify various missed opportunities
in scientifically meaningful benchmarking and propose solutions to evaluate
MLFF models, specifically in the aspects of sample efficiency, time domain
sensitivity, and cross-dataset generalization capabilities. By setting up the
problem instantiation similar to the actual scientific applications, more
meaningful performance metrics from the benchmark can be achieved. This suite
of metrics has demonstrated a better ability to assess a model's performance in
real-world scientific applications, in contrast to traditional AI benchmarking
methodologies. This work is a component of the SAIBench project, an AI4S
benchmarking suite. The project homepage is
https://www.computercouncil.org/SAIBench.
Related papers
- ML Research Benchmark [0.0]
We present the ML Research Benchmark (MLRB), comprising 7 competition-level tasks derived from recent machine learning conference tracks.
This paper introduces a novel benchmark and evaluates it using agent scaffolds powered by frontier models, including Claude-3 and GPT-4o.
The results indicate that the Claude-3.5 Sonnet agent performs best across our benchmark, excelling in planning and developing machine learning models.
arXiv Detail & Related papers (2024-10-29T21:38:42Z) - DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? [58.330879414174476]
We introduce DSBench, a benchmark designed to evaluate data science agents with realistic tasks.
This benchmark includes 466 data analysis tasks and 74 data modeling tasks, sourced from Eloquence and Kaggle competitions.
Our evaluation of state-of-the-art LLMs, LVLMs, and agents shows that they struggle with most tasks, with the best agent solving only 34.12% of data analysis tasks and achieving a 34.74% Relative Performance Gap (RPG)
arXiv Detail & Related papers (2024-09-12T02:08:00Z) - Automatic AI Model Selection for Wireless Systems: Online Learning via Digital Twinning [50.332027356848094]
AI-based applications are deployed at intelligent controllers to carry out functionalities like scheduling or power control.
The mapping between context and AI model parameters is ideally done in a zero-shot fashion.
This paper introduces a general methodology for the online optimization of AMS mappings.
arXiv Detail & Related papers (2024-06-22T11:17:50Z) - SAIBench: A Structural Interpretation of AI for Science Through
Benchmarks [2.6159098238462817]
This paper introduces a novel benchmarking approach, known as structural interpretation.
It addresses two key requirements: identifying the trusted operating range in the problem space and tracing errors back to their computational components.
The practical utility and effectiveness of structural interpretation are illustrated through its application to three distinct AI4S workloads.
arXiv Detail & Related papers (2023-11-29T18:17:35Z) - Incremental Online Learning Algorithms Comparison for Gesture and Visual
Smart Sensors [68.8204255655161]
This paper compares four state-of-the-art algorithms in two real applications: gesture recognition based on accelerometer data and image classification.
Our results confirm these systems' reliability and the feasibility of deploying them in tiny-memory MCUs.
arXiv Detail & Related papers (2022-09-01T17:05:20Z) - Towards Automated Imbalanced Learning with Deep Hierarchical
Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class.
Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class.
We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z) - TRUST XAI: Model-Agnostic Explanations for AI With a Case Study on IIoT
Security [0.0]
We propose a universal XAI model named Transparency Relying Upon Statistical Theory (XAI)
We show how TRUST XAI provides explanations for new random samples with an average success rate of 98%.
In the end, we also show how TRUST is explained to the user.
arXiv Detail & Related papers (2022-05-02T21:44:27Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - A User's Guide to Calibrating Robotics Simulators [54.85241102329546]
This paper proposes a set of benchmarks and a framework for the study of various algorithms aimed to transfer models and policies learnt in simulation to the real world.
We conduct experiments on a wide range of well known simulated environments to characterize and offer insights into the performance of different algorithms.
Our analysis can be useful for practitioners working in this area and can help make informed choices about the behavior and main properties of sim-to-real algorithms.
arXiv Detail & Related papers (2020-11-17T22:24:26Z) - AIPerf: Automated machine learning as an AI-HPC benchmark [17.57686674304368]
We propose an end-to-end benchmark suite utilizing automated machine learning (AutoML)
We implement the algorithms in a highly parallel and flexible way to ensure the efficiency and optimization potential on diverse systems.
With flexible workload and single metric, our benchmark can scale and rank AI- HPC easily.
arXiv Detail & Related papers (2020-08-17T08:06:43Z) - AIBench Training: Balanced Industry-Standard AI Training Benchmarking [26.820244556465333]
Earlier-stage evaluations of a new AI architecture/system need affordable benchmarks.
We use real-world benchmarks to cover the factors space that impacts the learning dynamics.
We contribute by far the most comprehensive AI training benchmark suite.
arXiv Detail & Related papers (2020-04-30T11:08:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.