Multi-document Summarization: A Comparative Evaluation
- URL: http://arxiv.org/abs/2309.04951v2
- Date: Tue, 12 Sep 2023 04:19:49 GMT
- Title: Multi-document Summarization: A Comparative Evaluation
- Authors: Kushan Hewapathirana (1 and 2), Nisansa de Silva (1), C.D. Athuraliya
(2) ((1) Department of Computer Science & Engineering, University of
Moratuwa, Sri Lanka, (2) ConscientAI, Sri Lanka)
- Abstract summary: This paper is aimed at evaluating state-of-the-art models for Multi-document Summarization (MDS) on different types of datasets in various domains.
We analyzed the performance of PRIMERA and PEG models on Big-Survey and MS$2$ datasets.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper is aimed at evaluating state-of-the-art models for Multi-document
Summarization (MDS) on different types of datasets in various domains and
investigating the limitations of existing models to determine future research
directions. To address this gap, we conducted an extensive literature review to
identify state-of-the-art models and datasets. We analyzed the performance of
PRIMERA and PEGASUS models on BigSurvey-MDS and MS$^2$ datasets, which posed
unique challenges due to their varied domains. Our findings show that the
General-Purpose Pre-trained Model LED outperforms PRIMERA and PEGASUS on the
MS$^2$ dataset. We used the ROUGE score as a performance metric to evaluate the
identified models on different datasets. Our study provides valuable insights
into the models' strengths and weaknesses, as well as their applicability in
different domains. This work serves as a reference for future MDS research and
contributes to the development of accurate and robust models which can be
utilized on demanding datasets with academically and/or scientifically complex
data as well as generalized, relatively simple datasets.
Related papers
- LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content [62.816876067499415]
We propose LiveXiv: a scalable evolving live benchmark based on scientific ArXiv papers.
LiveXiv accesses domain-specific manuscripts at any given timestamp and proposes to automatically generate visual question-answer pairs.
We benchmark multiple open and proprietary Large Multi-modal Models (LMMs) on the first version of our benchmark, showing its challenging nature and exposing the models true abilities.
arXiv Detail & Related papers (2024-10-14T17:51:23Z) - On Evaluation of Vision Datasets and Models using Human Competency Frameworks [20.802372291783488]
Item Response Theory (IRT) is a framework that infers interpretable latent parameters for an ensemble of models and each dataset item.
We assess model calibration, select informative data subsets, and demonstrate the usefulness of its latent parameters for analyzing and comparing models and datasets in computer vision.
arXiv Detail & Related papers (2024-09-06T06:20:11Z) - GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models.
GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies.
We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - State-of-the-art Models for Object Detection in Various Fields of
Application [0.0]
COCO minival, COCO test, Pascal VOC 2007, ADE20K, and ImageNet are reviewed.
The datasets are handpicked after closely comparing them with the rest in terms of diversity, quality of data, minimal bias, labeling quality etc.
It lists the top models and their optimal use cases for each of the respective datasets.
arXiv Detail & Related papers (2022-11-01T20:25:32Z) - Deep Learning Models for Knowledge Tracing: Review and Empirical
Evaluation [2.423547527175807]
We review and evaluate a body of deep learning knowledge tracing (DLKT) models with openly available and widely-used data sets.
The evaluated DLKT models have been reimplemented for assessing and replicability of previously reported results.
arXiv Detail & Related papers (2021-12-30T14:19:27Z) - On the Efficacy of Adversarial Data Collection for Question Answering:
Results from a Large-Scale Randomized Study [65.17429512679695]
In adversarial data collection (ADC), a human workforce interacts with a model in real time, attempting to produce examples that elicit incorrect predictions.
Despite ADC's intuitive appeal, it remains unclear when training on adversarial datasets produces more robust models.
arXiv Detail & Related papers (2021-06-02T00:48:33Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z) - BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information
Retrieval Models [41.45240621979654]
We introduce BEIR, a heterogeneous benchmark for information retrieval.
We study the effectiveness of nine state-of-the-art retrieval models in a zero-shot evaluation setup.
Dense-retrieval models are computationally more efficient but often underperform other approaches.
arXiv Detail & Related papers (2021-04-17T23:29:55Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.