AxCell: Automatic Extraction of Results from Machine Learning Papers
- URL: http://arxiv.org/abs/2004.14356v1
- Date: Wed, 29 Apr 2020 17:33:41 GMT
- Title: AxCell: Automatic Extraction of Results from Machine Learning Papers
- Authors: Marcin Kardas, Piotr Czapla, Pontus Stenetorp, Sebastian Ruder,
Sebastian Riedel, Ross Taylor, Robert Stojnic
- Abstract summary: We present AxCell, an automatic machine learning pipeline for extracting results from papers.
When compared with existing methods, our approach significantly improves the state of the art for results extraction.
We show the viability of our approach enables it to be used for semi-automated results extraction in production.
- Score: 44.15443359660737
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Tracking progress in machine learning has become increasingly difficult with
the recent explosion in the number of papers. In this paper, we present AxCell,
an automatic machine learning pipeline for extracting results from papers.
AxCell uses several novel components, including a table segmentation subtask,
to learn relevant structural knowledge that aids extraction. When compared with
existing methods, our approach significantly improves the state of the art for
results extraction. We also release a structured, annotated dataset for
training models for results extraction, and a dataset for evaluating the
performance of models on this task. Lastly, we show the viability of our
approach enables it to be used for semi-automated results extraction in
production, suggesting our improvements make this task practically viable for
the first time. Code is available on GitHub.
Related papers
- No More Manual Guides: Automatic and Scalable Generation of High-Quality Excel Tutorials [63.10037761131196]
Existing tutorials are manually authored by experts, require frequent updates after each software release, and incur substantial labor costs.<n>We present the first framework for automatically generating Excel tutorials directly from natural language task descriptions.<n>Our framework improves task execution success rates by 8.5% over state-of-the-art baselines.
arXiv Detail & Related papers (2025-09-26T03:21:39Z) - MOLE: Metadata Extraction and Validation in Scientific Papers Using LLMs [54.5729817345543]
MOLE is a framework that automatically extracts metadata attributes from scientific papers covering datasets of languages other than Arabic.<n>Our methodology processes entire documents across multiple input formats and incorporates robust validation mechanisms for consistent output.
arXiv Detail & Related papers (2025-05-26T10:31:26Z) - Leveraging Vision Capabilities of Multimodal LLMs for Automated Data Extraction from Plots [0.0]
We show that current large language models, with proper instructions and engineered prompts, are capable of accurately extracting data from plots.
This capability is inherent to the pretrained models and can be achieved with a chain-of-thought sequence of zero-shot engineered prompts.
arXiv Detail & Related papers (2025-03-16T02:41:43Z) - FabricQA-Extractor: A Question Answering System to Extract Information from Documents using Natural Language Questions [4.961045761391367]
Reading comprehension models answer questions posed in natural language when provided with a short passage of text.
We introduce a new model, Relation Coherence, that exploits knowledge of the relational structure to improve the extraction quality.
We demonstrate on two datasets that Relation Coherence boosts extraction performance and evaluate FabricQA-Extractor on large scale datasets.
arXiv Detail & Related papers (2024-08-17T15:16:54Z) - Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction.
We reformulate the task to be entity-centric, enabling the use of diverse metrics.
We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z) - Distantly Supervised Morpho-Syntactic Model for Relation Extraction [0.27195102129094995]
We present a method for the extraction and categorisation of an unrestricted set of relationships from text.
We evaluate our approach on six datasets built on Wikidata and Wikipedia.
arXiv Detail & Related papers (2024-01-18T14:17:40Z) - DORE: Document Ordered Relation Extraction based on Generative Framework [56.537386636819626]
This paper investigates the root cause of the underwhelming performance of the existing generative DocRE models.
We propose to generate a symbolic and ordered sequence from the relation matrix which is deterministic and easier for model to learn.
Experimental results on four datasets show that our proposed method can improve the performance of the generative DocRE models.
arXiv Detail & Related papers (2022-10-28T11:18:10Z) - ALBench: A Framework for Evaluating Active Learning in Object Detection [102.81795062493536]
This paper contributes an active learning benchmark framework named as ALBench for evaluating active learning in object detection.
Developed on an automatic deep model training system, this ALBench framework is easy-to-use, compatible with different active learning algorithms, and ensures the same training and testing protocols.
arXiv Detail & Related papers (2022-07-27T07:46:23Z) - Jointly Learning Span Extraction and Sequence Labeling for Information
Extraction from Business Documents [1.6249267147413522]
This paper introduces a new information extraction model for business documents.
It takes into account advantage of both span extraction and sequence labeling.
The model is trained end-to-end to jointly optimize the two tasks.
arXiv Detail & Related papers (2022-05-26T15:37:24Z) - AIFB-WebScience at SemEval-2022 Task 12: Relation Extraction First --
Using Relation Extraction to Identify Entities [0.0]
We present an end-to-end joint entity and relation extraction approach based on transformer-based language models.
In contrast to existing approaches, which perform entity and relation extraction in sequence, our system incorporates information from relation extraction into entity extraction.
arXiv Detail & Related papers (2022-03-10T12:19:44Z) - Scaling Systematic Literature Reviews with Machine Learning Pipelines [57.82662094602138]
Systematic reviews entail the extraction of data from scientific documents.
We construct a pipeline that automates each of these aspects, and experiment with many human-time vs. system quality trade-offs.
We find that we can get surprising accuracy and generalisability of the whole pipeline system with only 2 weeks of human-expert annotation.
arXiv Detail & Related papers (2020-10-09T16:19:42Z) - Bayesian active learning for production, a systematic study and a
reusable library [85.32971950095742]
In this paper, we analyse the main drawbacks of current active learning techniques.
We do a systematic study on the effects of the most common issues of real-world datasets on the deep active learning process.
We derive two techniques that can speed up the active learning loop such as partial uncertainty sampling and larger query size.
arXiv Detail & Related papers (2020-06-17T14:51:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.