Related papers: ORKG-Leaderboards: A Systematic Workflow for Mining Leaderboards as a Knowledge Graph

ORKG-Leaderboards: A Systematic Workflow for Mining Leaderboards as a Knowledge Graph

URL: http://arxiv.org/abs/2305.11068v1
Date: Wed, 10 May 2023 13:19:18 GMT
Title: ORKG-Leaderboards: A Systematic Workflow for Mining Leaderboards as a Knowledge Graph
Authors: Salomon Kabongo, Jennifer D'Souza and S\"oren Auer
Abstract summary: Orkg-Leaderboard is designed to extract leaderboards from large collections of empirical research papers in Artificial Intelligence (AI) The system is integrated with the Open Research Knowledge Graph (ORKG) platform, which fosters the machine-actionable publishing of findings. Our best model performs above 90% F1 on the textitleaderboard extraction task, thus proving Orkg-Leaderboards a practically viable tool for real-world usage.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The purpose of this work is to describe the Orkg-Leaderboard software designed to extract leaderboards defined as Task-Dataset-Metric tuples automatically from large collections of empirical research papers in Artificial Intelligence (AI). The software can support both the main workflows of scholarly publishing, viz. as LaTeX files or as PDF files. Furthermore, the system is integrated with the Open Research Knowledge Graph (ORKG) platform, which fosters the machine-actionable publishing of scholarly findings. Thus the system output, when integrated within the ORKG's supported Semantic Web infrastructure of representing machine-actionable 'resources' on the Web, enables: 1) broadly, the integration of empirical results of researchers across the world, thus enabling transparency in empirical research with the potential to also being complete contingent on the underlying data source(s) of publications; and 2) specifically, enables researchers to track the progress in AI with an overview of the state-of-the-art (SOTA) across the most common AI tasks and their corresponding datasets via dynamic ORKG frontend views leveraging tables and visualization charts over the machine-actionable data. Our best model achieves performances above 90% F1 on the \textit{leaderboard} extraction task, thus proving Orkg-Leaderboards a practically viable tool for real-world usage. Going forward, in a sense, Orkg-Leaderboards transforms the leaderboard extraction task to an automated digitalization task, which has been, for a long time in the community, a crowdsourced endeavor.

Related papers

MOLE: Metadata Extraction and Validation in Scientific Papers Using LLMs [54.5729817345543]
MOLE is a framework that automatically extracts metadata attributes from scientific papers covering datasets of languages other than Arabic.<n>Our methodology processes entire documents across multiple input formats and incorporates robust validation mechanisms for consistent output.
arXiv Detail & Related papers (2025-05-26T10:31:26Z)
A Position Paper on the Automatic Generation of Machine Learning Leaderboards [12.736094044510224]
An important task in machine learning (ML) research is comparing prior work, which is often performed via ML leaderboards.<n>To ease this burden, researchers have developed methods to extract leaderboard entries from research papers.<n>Yet, prior work varies in problem framing, complicating comparisons and limiting real-world applicability.<n>We propose an ALG unified conceptual framework to standardise how the ALG task is defined.
arXiv Detail & Related papers (2025-05-23T04:46:10Z)
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence [88.74800617923083]
We introduce Granite Vision, a lightweight large language model with vision capabilities. Our model is trained on a comprehensive instruction-following dataset. Granite Vision achieves strong results in standard benchmarks related to visual document understanding.
arXiv Detail & Related papers (2025-02-14T05:36:32Z)
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials [53.376263056033046]
Existing approaches rely on expensive human annotation, making them unsustainable at scale. We propose AgentTrek, a scalable data synthesis pipeline that generates web agent trajectories by leveraging publicly available tutorials. Our fully automated approach significantly reduces data collection costs, achieving a cost of just $0.55 per high-quality trajectory without human annotators.
arXiv Detail & Related papers (2024-12-12T18:59:27Z)
Capturing and Anticipating User Intents in Data Analytics via Knowledge Graphs [0.061446808540639365]
This work explores the usage of Knowledge Graphs (KG) as a basic framework for capturing a human-centered manner complex analytics. The data stored in the generated KG can then be exploited to provide assistance (e.g., recommendations) to the users interacting with these systems.
arXiv Detail & Related papers (2024-11-01T20:45:23Z)
EDGE: Enhanced Grounded GUI Understanding with Enriched Multi-Granularity Synthetic Data [15.801018643716437]
This paper aims to enhance the GUI understanding and interacting capabilities of large vision-language models (LVLMs) through a data-driven approach. We propose EDGE, a general data synthesis framework that automatically generates large-scale, multi-granularity training data from webpages across the Web. Our approach significantly reduces the dependence on manual annotations, empowering researchers to harness the vast public resources available on the Web to advance their work.
arXiv Detail & Related papers (2024-10-25T10:46:17Z)
Leveraging Large Language Models for Semantic Query Processing in a Scholarly Knowledge Graph [1.7418328181959968]
The proposed research aims to develop an innovative semantic query processing system. It enables users to obtain comprehensive information about research works produced by Computer Science (CS) researchers at the Australian National University.
arXiv Detail & Related papers (2024-05-24T09:19:45Z)
Text-Augmented Open Knowledge Graph Completion via Pre-Trained Language Models [53.09723678623779]
We propose TAGREAL to automatically generate quality query prompts and retrieve support information from large text corpora. The results show that TAGREAL achieves state-of-the-art performance on two benchmark datasets. We find that TAGREAL has superb performance even with limited training data, outperforming existing embedding-based, graph-based, and PLM-based methods.
arXiv Detail & Related papers (2023-05-24T22:09:35Z)
Scientific Paper Extractive Summarization Enhanced by Citation Graphs [50.19266650000948]
We focus on leveraging citation graphs to improve scientific paper extractive summarization under different settings. Preliminary results demonstrate that citation graph is helpful even in a simple unsupervised framework. Motivated by this, we propose a Graph-based Supervised Summarization model (GSS) to achieve more accurate results on the task when large-scale labeled data are available.
arXiv Detail & Related papers (2022-12-08T11:53:12Z)
Deep learning for table detection and structure recognition: A survey [49.09628624903334]
The goal of this survey is to provide a profound comprehension of the major developments in the field of Table Detection. We provide an analysis of both classic and new applications in the field. The datasets and source code of the existing models are organized to provide the reader with a compass on this vast literature.
arXiv Detail & Related papers (2022-11-15T19:42:27Z)
MONAI Label: A framework for AI-assisted Interactive Labeling of 3D Medical Images [49.664220687980006]
The lack of annotated datasets is a major bottleneck for training new task-specific supervised machine learning models. We present MONAI Label, a free and open-source framework that facilitates the development of applications based on artificial intelligence (AI) models.
arXiv Detail & Related papers (2022-03-23T12:33:11Z)
Automated Graph Machine Learning: Approaches, Libraries, Benchmarks and Directions [58.220137936626315]
This paper extensively discusses automated graph machine learning approaches. We introduce AutoGL, our dedicated and the world's first open-source library for automated graph machine learning. Also, we describe a tailored benchmark that supports unified, reproducible, and efficient evaluations.
arXiv Detail & Related papers (2022-01-04T18:31:31Z)
SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines. This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z)
Automated Mining of Leaderboards for Empirical AI Research [0.0]
This study presents a comprehensive approach for generating Leaderboards for knowledge-graph-based scholarly information organization. Specifically, we investigate the problem of automated Leaderboard construction using state-of-the-art transformer models, viz. Bert, SciBert, and XLNet. As a result, a vast share of empirical AI research can be organized in the next-generation digital libraries as knowledge graphs.
arXiv Detail & Related papers (2021-08-31T10:00:52Z)
Cardea: An Open Automated Machine Learning Framework for Electronic Health Records [11.170152156043336]
Cardea is an open-source automated machine learning framework. It allows users to build predictive models with their own data. We demonstrate our framework via 5 prediction tasks on MIMIC-III and Kaggle datasets.
arXiv Detail & Related papers (2020-10-01T15:58:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.