Overview of the SciHigh Track at FIRE 2025: Research Highlight Generation from Scientific Papers
- URL: http://arxiv.org/abs/2601.11582v1
- Date: Thu, 01 Jan 2026 16:25:16 GMT
- Title: Overview of the SciHigh Track at FIRE 2025: Research Highlight Generation from Scientific Papers
- Authors: Tohida Rehman, Debarshi Kumar Sanyal, Samiran Chattopadhyay,
- Abstract summary: SciHigh: Research Highlight Generation from Scientific Papers' focuses on the task of automatically generating concise, informative, and meaningful bullet-point highlights.<n>The track uses the MixSub dataset cite10172215, which provides pairs of abstracts and corresponding author-written highlights.<n>All submissions were evaluated using established metrics such as ROUGE, METEOR, and BERTScore to measure both alignment with author-written highlights and overall informativeness.
- Score: 7.474480823192324
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: `SciHigh: Research Highlight Generation from Scientific Papers' focuses on the task of automatically generating concise, informative, and meaningful bullet-point highlights directly from scientific abstracts. The goal of this task is to evaluate how effectively computational models can generate highlights that capture the key contributions, findings, and novelty of a paper in a concise form. Highlights help readers grasp essential ideas quickly and are often easier to read and understand than longer paragraphs, especially on mobile devices. The track uses the MixSub dataset \cite{10172215}, which provides pairs of abstracts and corresponding author-written highlights. In this inaugural edition of the track, 12 teams participated, exploring various approaches, including pre-trained language models, to generate highlights from this scientific dataset. All submissions were evaluated using established metrics such as ROUGE, METEOR, and BERTScore to measure both alignment with author-written highlights and overall informativeness. Teams were ranked based on ROUGE-L scores. The findings suggest that automatically generated highlights can reduce reading effort, accelerate literature reviews, and enhance metadata for digital libraries and academic search platforms. SciHigh provides a dedicated benchmark for advancing methods aimed at concise and accurate highlight generation from scientific writing.
Related papers
- Introducing Spotlight: A Novel Approach for Generating Captivating Key Information from Documents [25.75158276797885]
We introduce Spotlight, a novel paradigm for information extraction that produces concise, engaging narratives by highlighting the most compelling aspects of a document.<n>Our comprehensive evaluation demonstrates that the resulting model not only identifies key elements with precision but also enhances readability and boosts the engagement value of the original document.
arXiv Detail & Related papers (2025-09-13T18:18:37Z) - SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [97.31347312130119]
SciRIFF (Scientific Resource for Instruction-Following and Finetuning) is a dataset of 137K instruction-following instances for training and evaluation, covering 54 tasks.<n>These tasks span five core scientific literature understanding capabilities: information extraction, summarization, question answering, claim verification, and classification.<n> SciRIFF is unique in being entirely expert-written, high-quality instruction-following dataset for extracting and synthesizing information from research literature across diverse scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z) - Enriched BERT Embeddings for Scholarly Publication Classification [0.13654846342364302]
The NSLP 2024 FoRC Task I addresses this challenge organized as a competition.
The goal is to develop a classifier capable of predicting one of 123 predefined classes from the Open Research Knowledge Graph (ORKG) taxonomy of research fields for a given article.
arXiv Detail & Related papers (2024-05-07T09:05:20Z) - SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval [64.03631654052445]
Current benchmarks for evaluating MMIR performance in image-text pairing within the scientific domain show a notable gap.
We develop a specialised scientific MMIR benchmark by leveraging open-access paper collections.
This benchmark comprises 530K meticulously curated image-text pairs, extracted from figures and tables with detailed captions in scientific documents.
arXiv Detail & Related papers (2024-01-24T14:23:12Z) - Named Entity Recognition Based Automatic Generation of Research
Highlights [3.9410617513331863]
We aim to automatically generate research highlights using different sections of a research paper as input.
We investigate whether the use of named entity recognition on the input improves the quality of the generated highlights.
arXiv Detail & Related papers (2023-02-25T16:33:03Z) - Generation of Highlights from Research Papers Using Pointer-Generator
Networks and SciBERT Embeddings [5.095525589147811]
We use a pointer-generator network with coverage mechanism and a contextual embedding layer at the input that encodes the input tokens into SciBERT embeddings.
We test our model on a benchmark dataset, CSPubSum, and also present MixSub, a new multi-disciplinary corpus of papers for automatic research highlight generation.
arXiv Detail & Related papers (2023-02-14T12:45:14Z) - Scientific Paper Extractive Summarization Enhanced by Citation Graphs [50.19266650000948]
We focus on leveraging citation graphs to improve scientific paper extractive summarization under different settings.
Preliminary results demonstrate that citation graph is helpful even in a simple unsupervised framework.
Motivated by this, we propose a Graph-based Supervised Summarization model (GSS) to achieve more accurate results on the task when large-scale labeled data are available.
arXiv Detail & Related papers (2022-12-08T11:53:12Z) - Neural Content Extraction for Poster Generation of Scientific Papers [84.30128728027375]
The problem of poster generation for scientific papers is under-investigated.
Previous studies focus mainly on poster layout and panel composition, while neglecting the importance of content extraction.
To get both textual and visual elements of a poster panel, a neural extractive model is proposed to extract text, figures and tables of a paper section simultaneously.
arXiv Detail & Related papers (2021-12-16T01:19:37Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - From Standard Summarization to New Tasks and Beyond: Summarization with
Manifold Information [77.89755281215079]
Text summarization is the research area aiming at creating a short and condensed version of the original document.
In real-world applications, most of the data is not in a plain text format.
This paper focuses on the survey of these new summarization tasks and approaches in the real-world application.
arXiv Detail & Related papers (2020-05-10T14:59:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.