Introducing Spotlight: A Novel Approach for Generating Captivating Key Information from Documents
- URL: http://arxiv.org/abs/2509.10935v3
- Date: Tue, 21 Oct 2025 14:21:49 GMT
- Title: Introducing Spotlight: A Novel Approach for Generating Captivating Key Information from Documents
- Authors: Ankan Mullick, Sombit Bose, Rounak Saha, Ayan Kumar Bhowmick, Aditya Vempaty, Prasenjit Dey, Ravi Kokku, Pawan Goyal, Niloy Ganguly,
- Abstract summary: We introduce Spotlight, a novel paradigm for information extraction that produces concise, engaging narratives by highlighting the most compelling aspects of a document.<n>Our comprehensive evaluation demonstrates that the resulting model not only identifies key elements with precision but also enhances readability and boosts the engagement value of the original document.
- Score: 25.75158276797885
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: In this paper, we introduce Spotlight, a novel paradigm for information extraction that produces concise, engaging narratives by highlighting the most compelling aspects of a document. Unlike traditional summaries, which prioritize comprehensive coverage, spotlights selectively emphasize intriguing content to foster deeper reader engagement with the source material. We formally differentiate spotlights from related constructs and support our analysis with a detailed benchmarking study using new datasets curated for this work. To generate high-quality spotlights, we propose a two-stage approach: fine-tuning a large language model on our benchmark data, followed by alignment via Direct Preference Optimization (DPO). Our comprehensive evaluation demonstrates that the resulting model not only identifies key elements with precision but also enhances readability and boosts the engagement value of the original document.
Related papers
- Improving Neural Topic Modeling with Semantically-Grounded Soft Label Distributions [15.97570754056266]
We propose a novel approach to construct semantically-grounded soft label targets using Language Models (LMs)<n>Our method produces higher-quality topics that are more closely aligned with the underlying thematic structure of the corpus.<n>We also introduce a retrieval-based metric, which shows that our approach significantly outperforms existing methods in identifying semantically similar documents.
arXiv Detail & Related papers (2026-02-20T00:12:04Z) - Enhancing Long Document Long Form Summarisation with Self-Planning [29.76306977276126]
We introduce a novel approach for long context summarisation, highlight-guided generation.<n>Our framework applies self-planning methods to identify important content and then generates a summary conditioned on the plan.
arXiv Detail & Related papers (2025-12-19T02:37:30Z) - Topic-Guided Reinforcement Learning with LLMs for Enhancing Multi-Document Summarization [49.61589046694085]
We propose a topic-guided reinforcement learning approach to improve content selection in Multi-Document Summarization.<n>We first show that explicitly prompting models with topic labels enhances the informativeness of the generated summaries.
arXiv Detail & Related papers (2025-09-11T21:01:54Z) - Movie2Story: A framework for understanding videos and telling stories in the form of novel text [0.0]
We propose a novel benchmark to evaluate text generation capabilities in scenarios enriched with auxiliary information.<n>Our work introduces an innovative automatic dataset generation method to ensure the availability of accurate auxiliary information.<n>Our experiments reveal that current Multi-modal Large Language Models (MLLMs) perform suboptimally under the proposed evaluation metrics.
arXiv Detail & Related papers (2024-12-19T15:44:04Z) - TG-LLaVA: Text Guided LLaVA via Learnable Latent Embeddings [61.9257731511557]
We propose Text Guided LLaVA (TG-LLaVA) to optimize vision-language models (VLMs)
We use learnable latent embeddings as a bridge to analyze textual instruction and add the analysis results to the vision encoder as guidance.
With the guidance of text, the vision encoder can extract text-related features, similar to how humans focus on the most relevant parts of an image when considering a question.
arXiv Detail & Related papers (2024-09-15T00:38:34Z) - Exploring Precision and Recall to assess the quality and diversity of LLMs [82.21278402856079]
We introduce a novel evaluation framework for Large Language Models (LLMs) such as textscLlama-2 and textscMistral.
This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora.
arXiv Detail & Related papers (2024-02-16T13:53:26Z) - LLM Based Multi-Document Summarization Exploiting Main-Event Biased
Monotone Submodular Content Extraction [42.171703872560286]
Multi-document summarization is a challenging task due to its inherent subjective bias.
We aim to enhance the objectivity of news summarization by focusing on the main event of a group of related news documents.
arXiv Detail & Related papers (2023-10-05T09:38:09Z) - Named Entity Recognition Based Automatic Generation of Research
Highlights [3.9410617513331863]
We aim to automatically generate research highlights using different sections of a research paper as input.
We investigate whether the use of named entity recognition on the input improves the quality of the generated highlights.
arXiv Detail & Related papers (2023-02-25T16:33:03Z) - Improving Keyphrase Extraction with Data Augmentation and Information
Filtering [67.43025048639333]
Keyphrase extraction is one of the essential tasks for document understanding in NLP.
We present a novel corpus and method for keyphrase extraction from the videos streamed on the Behance platform.
arXiv Detail & Related papers (2022-09-11T22:38:02Z) - Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z) - iFacetSum: Coreference-based Interactive Faceted Summarization for
Multi-Document Exploration [63.272359227081836]
iFacetSum integrates interactive summarization together with faceted search.
Fine-grained facets are automatically produced based on cross-document coreference pipelines.
arXiv Detail & Related papers (2021-09-23T20:01:11Z) - Towards Robust Visual Information Extraction in Real World: New Dataset
and Novel Solution [30.438041837029875]
We propose a robust visual information extraction system (VIES) towards real-world scenarios.
VIES is a unified end-to-end trainable framework for simultaneous text detection, recognition and information extraction.
We construct a fully-annotated dataset called EPHOIE, which is the first Chinese benchmark for both text spotting and visual information extraction.
arXiv Detail & Related papers (2021-01-24T11:05:24Z) - Better Highlighting: Creating Sub-Sentence Summary Highlights [40.46639471959677]
We present a new method to produce self-contained highlights that are understandable on their own to avoid confusion.
Our method combines determinantal point processes and deep contextualized representations to identify an optimal set of sub-sentence segments.
To demonstrate the flexibility and modeling power of our method, we conduct extensive experiments on summarization datasets.
arXiv Detail & Related papers (2020-10-20T18:57:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.