FAIR evaluation of ten widely used chemical datasets: Lessons learned and recommendations
- URL: http://arxiv.org/abs/2407.15591v1
- Date: Mon, 22 Jul 2024 12:26:41 GMT
- Title: FAIR evaluation of ten widely used chemical datasets: Lessons learned and recommendations
- Authors: Marcos Da Silveira, Oona Freudenthal, Louis Deladiennee,
- Abstract summary: This document focuses on databases disseminating data on (hazardous) substances found on the North American and the European (EU) market.
The goal is to analyse the FAIRness of published open data on these substances.
We implement two complementary approaches: Manual, and Automatic.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This document focuses on databases disseminating data on (hazardous) substances found on the North American and the European (EU) market. The goal is to analyse the FAIRness (Findability, Accessibility, Interoperability and Reusability) of published open data on these substances and to qualitatively evaluate to what extend the selected databases already fulfil the criteria set out in the commission draft regulation on a common data chemicals platform. We implemented two complementary approaches: Manual, and Automatic. The manual approach is based on online questionnaires. These questionnaires provide a structured approach to evaluating FAIRness by guiding users through a series of questions related to the FAIR principles. They are particularly useful for initiating discussions on FAIR implementation within research teams and for identifying areas that require further attention. Automated tools for FAIRness assessment, such as F-UJI and FAIR Checker, are gaining prominence and are continuously under development. Unlike manual tools, automated tools perform a series of tests automatically starting from a dereferenceable URL to the data resource to be evaluated. We analysed ten widely adopted datasets managed in Europe and North America. The highest score from automatic analysis was 54/100. The manual analysis shows that several FAIR metrics were satisfied, but not detectable by automatic tools because there is no metadata, or the format of the information was not a standard one. Thus, it was not interpretable by the tool. We present the details of the analysis and tables summarizing the outcomes, the issues, and the suggestions to address these issues.
Related papers
- Survey on Semantic Interpretation of Tabular Data: Challenges and Directions [2.324913904215885]
This survey aims to provide a comprehensive overview of the Semantic Table Interpretation landscape.
It starts by categorizing approaches using a taxonomy of 31 attributes, allowing for comparisons and evaluations.
It also examines available tools, assessing them based on 12 criteria.
arXiv Detail & Related papers (2024-11-07T14:28:56Z) - AutoFAIR : Automatic Data FAIRification via Machine Reading [28.683653852643015]
We propose AutoFAIR, an architecture designed to enhance data FAIRness automately.
We align each data and metadata operation with specific FAIR indicators to guide machine-executable actions.
We observe significant improvements in findability, accessibility, interoperability, and reusability of data.
arXiv Detail & Related papers (2024-08-07T17:36:58Z) - InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation [79.09622602860703]
We introduce InsightBench, a benchmark dataset with three key features.
It consists of 100 datasets representing diverse business use cases such as finance and incident management.
Unlike existing benchmarks focusing on answering single queries, InsightBench evaluates agents based on their ability to perform end-to-end data analytics.
arXiv Detail & Related papers (2024-07-08T22:06:09Z) - DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - Towards Robust Evaluation: A Comprehensive Taxonomy of Datasets and Metrics for Open Domain Question Answering in the Era of Large Language Models [0.0]
Open Domain Question Answering (ODQA) within natural language processing involves building systems that answer factual questions using large-scale knowledge corpora.
High-quality datasets are used to train models on realistic scenarios.
Standardized metrics facilitate comparisons between different ODQA systems.
arXiv Detail & Related papers (2024-06-19T05:43:02Z) - Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books.
Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z) - Is Reference Necessary in the Evaluation of NLG Systems? When and Where? [58.52957222172377]
We show that reference-free metrics exhibit a higher correlation with human judgment and greater sensitivity to deficiencies in language quality.
Our study can provide insight into the appropriate application of automatic metrics and the impact of metric choice on evaluation performance.
arXiv Detail & Related papers (2024-03-21T10:31:11Z) - InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks [84.7788065721689]
In this paper, we introduce InfiAgent-DABench, the first benchmark specifically designed to evaluate LLM-based agents on data analysis tasks.
This benchmark contains DAEval, a dataset consisting of 257 data analysis questions derived from 52 CSV files.
Building on top of our agent framework, we develop a specialized agent, DAAgent, which surpasses GPT-3.5 by 3.9% on DABench.
arXiv Detail & Related papers (2024-01-10T19:04:00Z) - CSMeD: Bridging the Dataset Gap in Automated Citation Screening for
Systematic Literature Reviews [10.207938863784829]
We introduce CSMeD, a meta-dataset consolidating nine publicly released collections.
CSMeD serves as a comprehensive resource for training and evaluating the performance of automated citation screening models.
We introduce CSMeD-FT, a new dataset designed explicitly for evaluating the full text publication screening task.
arXiv Detail & Related papers (2023-11-21T09:36:11Z) - Collect, Measure, Repeat: Reliability Factors for Responsible AI Data
Collection [8.12993269922936]
We argue that data collection for AI should be performed in a responsible manner.
We propose a Responsible AI (RAI) methodology designed to guide the data collection with a set of metrics.
arXiv Detail & Related papers (2023-08-22T18:01:27Z) - FETA: Towards Specializing Foundation Models for Expert Task
Applications [49.57393504125937]
Foundation Models (FMs) have demonstrated unprecedented capabilities including zero-shot learning, high fidelity data synthesis, and out of domain generalization.
We show in this paper that FMs still have poor out-of-the-box performance on expert tasks.
We propose a first of its kind FETA benchmark built around the task of teaching FMs to understand technical documentation.
arXiv Detail & Related papers (2022-09-08T08:47:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.