SciRepEval: A Multi-Format Benchmark for Scientific Document
Representations
- URL: http://arxiv.org/abs/2211.13308v4
- Date: Mon, 13 Nov 2023 18:25:27 GMT
- Title: SciRepEval: A Multi-Format Benchmark for Scientific Document
Representations
- Authors: Amanpreet Singh, Mike D'Arcy, Arman Cohan, Doug Downey, Sergey Feldman
- Abstract summary: We introduce SciRepEval, the first comprehensive benchmark for training and evaluating scientific document representations.
We show how state-of-the-art models like SPECTER and SciNCL struggle to generalize across the task formats.
A new approach that learns multiple embeddings per document, each tailored to a different format, can improve performance.
- Score: 52.01865318382197
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learned representations of scientific documents can serve as valuable input
features for downstream tasks without further fine-tuning. However, existing
benchmarks for evaluating these representations fail to capture the diversity
of relevant tasks. In response, we introduce SciRepEval, the first
comprehensive benchmark for training and evaluating scientific document
representations. It includes 24 challenging and realistic tasks, 8 of which are
new, across four formats: classification, regression, ranking and search. We
then use this benchmark to study and improve the generalization ability of
scientific document representation models. We show how state-of-the-art models
like SPECTER and SciNCL struggle to generalize across the task formats, and
that simple multi-task training fails to improve them. However, a new approach
that learns multiple embeddings per document, each tailored to a different
format, can improve performance. We experiment with task-format-specific
control codes and adapters and find they outperform the existing
single-embedding state-of-the-art by over 2 points absolute. We release the
resulting family of multi-format models, called SPECTER2, for the community to
use and build on.
Related papers
- MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding [59.41495657570397]
This dataset includes figures such as schematic diagrams, simulated images, macroscopic/microscopic photos, and experimental visualizations.
We developed benchmarks for scientific figure captioning and multiple-choice questions, evaluating six proprietary and over ten open-source models.
The dataset and benchmarks will be released to support further research.
arXiv Detail & Related papers (2024-07-06T00:40:53Z) - DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models [63.466265039007816]
We present DocGenome, a structured document benchmark constructed by annotating 500K scientific documents from 153 disciplines in the arXiv open-access community.
We conduct extensive experiments to demonstrate the advantages of DocGenome and objectively evaluate the performance of large models on our benchmark.
arXiv Detail & Related papers (2024-06-17T15:13:52Z) - Beyond Document Page Classification: Design, Datasets, and Challenges [32.94494070330065]
This paper highlights the need to bring document classification benchmarking closer to real-world applications.
We identify the lack of public multi-page document classification datasets, formalize different classification tasks arising in application scenarios, and motivate the value of targeting efficient multi-page document representations.
arXiv Detail & Related papers (2023-08-24T16:16:47Z) - MIReAD: Simple Method for Learning High-quality Representations from
Scientific Documents [77.34726150561087]
We propose MIReAD, a simple method that learns high-quality representations of scientific papers.
We train MIReAD on more than 500,000 PubMed and arXiv abstracts across over 2,000 journal classes.
arXiv Detail & Related papers (2023-05-07T03:29:55Z) - Large-scale learning of generalised representations for speaker
recognition [52.978310296712834]
We develop a speaker recognition model to be used in diverse scenarios.
We investigate several new training data configurations combining a few existing datasets.
We find that MFA-Conformer with the least inductive bias generalises the best.
arXiv Detail & Related papers (2022-10-20T03:08:18Z) - Grad2Task: Improved Few-shot Text Classification Using Gradients for
Task Representation [24.488427641442694]
We propose a novel conditional neural process-based approach for few-shot text classification.
Our key idea is to represent each task using gradient information from a base model.
Our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches.
arXiv Detail & Related papers (2022-01-27T15:29:30Z) - Multi-Vector Models with Textual Guidance for Fine-Grained Scientific
Document Similarity [11.157086694203201]
We present a new scientific document similarity model based on matching fine-grained aspects.
Our model is trained using co-citation contexts that describe related paper aspects as a novel form of textual supervision.
arXiv Detail & Related papers (2021-11-16T11:12:30Z) - SPECTER: Document-level Representation Learning using Citation-informed
Transformers [51.048515757909215]
SPECTER generates document-level embedding of scientific documents based on pretraining a Transformer language model.
We introduce SciDocs, a new evaluation benchmark consisting of seven document-level tasks ranging from citation prediction to document classification and recommendation.
arXiv Detail & Related papers (2020-04-15T16:05:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.