Related papers: Converging Dimensions: Information Extraction and Summarization through Multisource, Multimodal, and Multilingual Fusion

Converging Dimensions: Information Extraction and Summarization through Multisource, Multimodal, and Multilingual Fusion

URL: http://arxiv.org/abs/2406.13715v1
Date: Wed, 19 Jun 2024 17:15:47 GMT
Title: Converging Dimensions: Information Extraction and Summarization through Multisource, Multimodal, and Multilingual Fusion
Authors: Pranav Janjani, Mayank Palan, Sarvesh Shirude, Ninad Shegokar, Sunny Kumar, Faruk Kazi,
Abstract summary: The paper proposes a novel approach to summarization that tackles such challenges by utilizing the strength of multiple sources. The research progresses beyond conventional, unimodal sources such as text documents and integrates a more diverse range of data, including YouTube playlists, pre-prints, and Wikipedia pages.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in large language models (LLMs) have led to new summarization strategies, offering an extensive toolkit for extracting important information. However, these approaches are frequently limited by their reliance on isolated sources of data. The amount of information that can be gathered is limited and covers a smaller range of themes, which introduces the possibility of falsified content and limited support for multilingual and multimodal data. The paper proposes a novel approach to summarization that tackles such challenges by utilizing the strength of multiple sources to deliver a more exhaustive and informative understanding of intricate topics. The research progresses beyond conventional, unimodal sources such as text documents and integrates a more diverse range of data, including YouTube playlists, pre-prints, and Wikipedia pages. The aforementioned varied sources are then converted into a unified textual representation, enabling a more holistic analysis. This multifaceted approach to summary generation empowers us to extract pertinent information from a wider array of sources. The primary tenet of this approach is to maximize information gain while minimizing information overlap and maintaining a high level of informativeness, which encourages the generation of highly coherent summaries.

Related papers

MCiteBench: A Benchmark for Multimodal Citation Text Generation in MLLMs [31.793037002996257]
Multimodal Large Language Models (MLLMs) have advanced in integrating diverse modalities but frequently suffer from hallucination. Existing work primarily focuses on generating citations for text-only content, overlooking the challenges and opportunities of multimodal contexts. We introduce MCiteBench, the first benchmark designed to evaluate and analyze the multimodal citation text generation ability of MLLMs.
arXiv Detail & Related papers (2025-03-04T13:12:39Z)
Tell me what I need to know: Exploring LLM-based (Personalized) Abstractive Multi-Source Meeting Summarization [5.979778557940213]
Meeting summarization is crucial in digital communication, but existing solutions struggle with salience identification. Previous attempts to address these issues by considering related supplementary resources (e.g., presentation slides) alongside transcripts are hindered by models' limited context sizes. This work explores multi-source meeting summarization considering supplementary materials through a three-stage large language model approach.
arXiv Detail & Related papers (2024-10-18T15:40:48Z)
Fine-tuning Multimodal Large Language Models for Product Bundling [53.01642741096356]
We introduce Bundle-MLLM, a novel framework that fine-tunes large language models (LLMs) through a hybrid item tokenization approach. Specifically, we integrate textual, media, and relational data into a unified tokenization, introducing a soft separation token to distinguish between textual and non-textual tokens. We propose a progressive optimization strategy that fine-tunes LLMs for disentangled objectives: 1) learning bundle patterns and 2) enhancing multimodal semantic understanding specific to product bundling.
arXiv Detail & Related papers (2024-07-16T13:30:14Z)
SEMQA: Semi-Extractive Multi-Source Question Answering [94.04430035121136]
We introduce a new QA task for answering multi-answer questions by summarizing multiple diverse sources in a semi-extractive fashion. We create the first dataset of this kind, QuoteSum, with human-written semi-extractive answers to natural and generated questions.
arXiv Detail & Related papers (2023-11-08T18:46:32Z)
Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles [136.84278943588652]
We propose a new task of summarizing diverse information encountered in multiple news articles encompassing the same event. To facilitate this task, we outlined a data collection schema for identifying diverse information and curated a dataset named DiverseSumm. The dataset includes 245 news stories, with each story comprising 10 news articles and paired with a human-validated reference.
arXiv Detail & Related papers (2023-09-17T20:28:17Z)
MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos [106.06278332186106]
Multimodal summarization with multimodal output (MSMO) has emerged as a promising research direction. Numerous limitations exist within existing public MSMO datasets. We have meticulously curated the textbfMMSum dataset.
arXiv Detail & Related papers (2023-06-07T07:43:11Z)
ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding [31.227481709446746]
Existing approaches mainly focus on fine-grained elements such as words and document image, making it hard for them to learn from coarse-grained elements. In this paper, we attach more importance to coarse-grained elements containing high-density information and consistent semantics. Our method can improve the performance of multimodal Transformers based on fine-grained elements and achieve better performance with fewer parameters.
arXiv Detail & Related papers (2022-09-18T13:46:56Z)
TRIE++: Towards End-to-End Information Extraction from Visually Rich Documents [51.744527199305445]
This paper proposes a unified end-to-end information extraction framework from visually rich documents. Text reading and information extraction can reinforce each other via a well-designed multi-modal context block. The framework can be trained in an end-to-end trainable manner, achieving global optimization.
arXiv Detail & Related papers (2022-07-14T08:52:07Z)
Unsupervised Summarization with Customized Granularities [76.26899748972423]
We propose the first unsupervised multi-granularity summarization framework, GranuSum. By inputting different numbers of events, GranuSum is capable of producing multi-granular summaries in an unsupervised manner.
arXiv Detail & Related papers (2022-01-29T05:56:35Z)
Extending Multi-Text Sentence Fusion Resources via Pyramid Annotations [12.394777121890925]
This paper revisits and substantially extends previous dataset creation efforts. We show that our extended version uses more representative texts for multi-document tasks and provides a larger and more diverse training set.
arXiv Detail & Related papers (2021-10-09T09:15:05Z)
Topic Modeling Based Extractive Text Summarization [0.0]
We propose a novel method to summarize a text document by clustering its contents based on latent topics. We utilize the lesser used and challenging WikiHow dataset in our approach to text summarization.
arXiv Detail & Related papers (2021-06-29T12:28:19Z)
Multi-modal Summarization for Video-containing Documents [23.750585762568665]
We propose a novel multi-modal summarization task to summarize from a document and its associated video. Comprehensive experiments show that the proposed model is beneficial for multi-modal summarization and superior to existing methods.
arXiv Detail & Related papers (2020-09-17T02:13:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.