Improving Editorial Workflow and Metadata Quality at Springer Nature
- URL: http://arxiv.org/abs/2103.13527v1
- Date: Wed, 24 Mar 2021 23:23:59 GMT
- Title: Improving Editorial Workflow and Metadata Quality at Springer Nature
- Authors: Angelo A. Salatino, Francesco Osborne, Aliaksandr Birukou and Enrico
Motta
- Abstract summary: Smart Topic Miner (STM) is an application that assists the Springer Nature editorial team in annotating the volumes of all books covering conference proceedings in Computer Science.
STM has been regularly used by editors in Germany, China, Brazil, India, and Japan, for a total of about 800 volumes per year.
In particular, our solution has drastically reduced the time needed to annotate proceedings and significantly improved their discoverability, resulting in 9.3 million additional downloads.
- Score: 7.1717344176500335
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Identifying the research topics that best describe the scope of a scientific
publication is a crucial task for editors, in particular because the quality of
these annotations determine how effectively users are able to discover the
right content in online libraries. For this reason, Springer Nature, the
world's largest academic book publisher, has traditionally entrusted this task
to their most expert editors. These editors manually analyse all new books,
possibly including hundreds of chapters, and produce a list of the most
relevant topics. Hence, this process has traditionally been very expensive,
time-consuming, and confined to a few senior editors. For these reasons, back
in 2016 we developed Smart Topic Miner (STM), an ontology-driven application
that assists the Springer Nature editorial team in annotating the volumes of
all books covering conference proceedings in Computer Science. Since then STM
has been regularly used by editors in Germany, China, Brazil, India, and Japan,
for a total of about 800 volumes per year. Over the past three years the
initial prototype has iteratively evolved in response to feedback from the
users and evolving requirements. In this paper we present the most recent
version of the tool and describe the evolution of the system over the years,
the key lessons learnt, and the impact on the Springer Nature workflow. In
particular, our solution has drastically reduced the time needed to annotate
proceedings and significantly improved their discoverability, resulting in 9.3
million additional downloads. We also present a user study involving 9 editors,
which yielded excellent results in term of usability, and report an evaluation
of the new topic classifier used by STM, which outperforms previous versions in
recall and F-measure.
Related papers
- Language Modeling with Editable External Knowledge [90.7714362827356]
This paper introduces ERASE, which improves model behavior when new documents are acquired.
It incrementally deletes or rewriting other entries in the knowledge base each time a document is added.
It improves accuracy relative to conventional retrieval-augmented generation by 7-13% (Mixtral-8x7B) and 6-10% (Llama-3-8B) absolute.
arXiv Detail & Related papers (2024-06-17T17:59:35Z) - MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows [58.56005277371235]
We introduce MASSW, a comprehensive text dataset on Multi-Aspect Summarization of ScientificAspects.
MASSW includes more than 152,000 peer-reviewed publications from 17 leading computer science conferences spanning the past 50 years.
We demonstrate the utility of MASSW through multiple novel machine-learning tasks that can be benchmarked using this new dataset.
arXiv Detail & Related papers (2024-06-10T15:19:09Z) - Language Models as Science Tutors [79.73256703631492]
We introduce TutorEval and TutorChat to measure real-life usability of LMs as scientific assistants.
We show that fine-tuning base models with existing dialogue datasets leads to poor performance on TutorEval.
We use TutorChat to fine-tune Llemma models with 7B and 34B parameters. These LM tutors specialized in math have a 32K-token context window, and they excel at TutorEval while performing strongly on GSM8K and MATH.
arXiv Detail & Related papers (2024-02-16T22:24:13Z) - Trustworthy Machine Learning [57.08542102068706]
This textbook on Trustworthy Machine Learning (TML) covers a theoretical and technical background of four key topics in TML.
We discuss important classical and contemporary research papers of the aforementioned fields and uncover and connect their underlying intuitions.
arXiv Detail & Related papers (2023-10-12T11:04:17Z) - EditEval: An Instruction-Based Benchmark for Text Improvements [73.5918084416016]
This work presents EditEval: An instruction-based, benchmark and evaluation suite for automatic evaluation of editing capabilities.
We evaluate several pre-trained models, which shows that InstructGPT and PEER perform the best, but that most baselines fall below the supervised SOTA.
Our analysis shows that commonly used metrics for editing tasks do not always correlate well, and that optimization for prompts with the highest performance does not necessarily entail the strongest robustness to different models.
arXiv Detail & Related papers (2022-09-27T12:26:05Z) - Generating Summaries for Scientific Paper Review [29.12631698162247]
The increase of submissions for top venues in machine learning and NLP has caused a problem of excessive burden on reviewers.
An automatic system for assisting with the reviewing process could be a solution for ameliorating the problem.
In this paper, we explore automatic review summary generation for scientific papers.
arXiv Detail & Related papers (2021-09-28T21:43:53Z) - Ontology-Based Recommendation of Editorial Products [7.1717344176500335]
Smart Book Recommender (SBR) supports Springer Nature's Computer Science editorial team in selecting the products to market at specific venues.
SBR recommends books, journals, and conference proceedings relevant to a conference by taking advantage of a semantically enhanced representation of about 27K editorial products.
SBR also allows users to investigate why a certain publication was suggested by the system.
arXiv Detail & Related papers (2021-03-24T23:23:53Z) - What's New? Summarizing Contributions in Scientific Literature [85.95906677964815]
We introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work.
We extend the S2ORC corpus of academic articles by adding disentangled "contribution" and "context" reference labels.
We propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs.
arXiv Detail & Related papers (2020-11-06T02:23:01Z) - Scalable Recommendation of Wikipedia Articles to Editors Using
Representation Learning [1.8810916321241067]
We develop a scalable system on top of Graph Convolutional Networks and Doc2Vec, learning how to represent Wikipedia articles and deliver personalized recommendations for editors.
We test our model on editors' histories, predicting their most recent edits based on their prior edits.
All of the data used on this paper is publicly available, including graph embeddings for Wikipedia articles, and we release our code to support replication of our experiments.
arXiv Detail & Related papers (2020-09-24T15:56:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.