Are Layout-Infused Language Models Robust to Layout Distribution Shifts?
A Case Study with Scientific Documents
- URL: http://arxiv.org/abs/2306.01058v1
- Date: Thu, 1 Jun 2023 18:01:33 GMT
- Title: Are Layout-Infused Language Models Robust to Layout Distribution Shifts?
A Case Study with Scientific Documents
- Authors: Catherine Chen, Zejiang Shen, Dan Klein, Gabriel Stanovsky, Doug
Downey and Kyle Lo
- Abstract summary: Recent work has shown that infusing layout features into language models (LMs) improves processing of visually-rich documents such as scientific papers.
We test whether layout-infused LMs are robust to layout distribution shifts.
- Score: 54.744701806413204
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent work has shown that infusing layout features into language models
(LMs) improves processing of visually-rich documents such as scientific papers.
Layout-infused LMs are often evaluated on documents with familiar layout
features (e.g., papers from the same publisher), but in practice models
encounter documents with unfamiliar distributions of layout features, such as
new combinations of text sizes and styles, or new spatial configurations of
textual elements. In this work we test whether layout-infused LMs are robust to
layout distribution shifts. As a case study we use the task of scientific
document structure recovery, segmenting a scientific paper into its structural
categories (e.g., "title", "caption", "reference"). To emulate distribution
shifts that occur in practice we re-partition the GROTOAP2 dataset. We find
that under layout distribution shifts model performance degrades by up to 20
F1. Simple training strategies, such as increasing training diversity, can
reduce this degradation by over 35% relative F1; however, models fail to reach
in-distribution performance in any tested out-of-distribution conditions. This
work highlights the need to consider layout distribution shifts during model
evaluation, and presents a methodology for conducting such evaluations.
Related papers
- Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints [53.66698106829144]
We propose a unified model to handle a broad range of layout generation tasks.
The model is based on continuous diffusion models.
Experiment results show that LACE produces high-quality layouts.
arXiv Detail & Related papers (2024-02-07T11:12:41Z) - Enhancing Visually-Rich Document Understanding via Layout Structure
Modeling [91.07963806829237]
We propose GraphLM, a novel document understanding model that injects layout knowledge into the model.
We evaluate our model on various benchmarks, including FUNSD, XFUND and CORD, and achieve state-of-the-art results.
arXiv Detail & Related papers (2023-08-15T13:53:52Z) - Predicting Software Performance with Divide-and-Learn [3.635696352780227]
We propose an approach based on the concept of 'divide-and-learn', dubbed DaL.
Experiment results from eight real-world systems and five sets of training data reveal that DaL performs no worse than the best counterpart on 33 out of 40 cases.
arXiv Detail & Related papers (2023-06-11T11:16:27Z) - GVdoc: Graph-based Visual Document Classification [17.350393956461783]
We propose GVdoc, a graph-based document classification model.
Our approach generates a document graph based on its layout, and then trains a graph neural network to learn node and graph embeddings.
We show that our model, even with fewer parameters, outperforms state-of-the-art models on out-of-distribution data.
arXiv Detail & Related papers (2023-05-26T19:23:20Z) - Unifying Layout Generation with a Decoupled Diffusion Model [26.659337441975143]
It is a crucial task for reducing the burden on heavy-duty graphic design works for formatted scenes, e.g., publications, documents, and user interfaces (UIs)
We propose a layout Diffusion Generative Model (LDGM) to achieve such unification with a single decoupled diffusion model.
Our proposed LDGM can generate layouts either from scratch or conditional on arbitrary available attributes.
arXiv Detail & Related papers (2023-03-09T05:53:32Z) - SciRepEval: A Multi-Format Benchmark for Scientific Document
Representations [52.01865318382197]
We introduce SciRepEval, the first comprehensive benchmark for training and evaluating scientific document representations.
We show how state-of-the-art models like SPECTER and SciNCL struggle to generalize across the task formats.
A new approach that learns multiple embeddings per document, each tailored to a different format, can improve performance.
arXiv Detail & Related papers (2022-11-23T21:25:39Z) - ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich
Document Understanding [52.3895498789521]
We propose ERNIE, a novel document pre-training solution with layout knowledge enhancement.
We first rearrange input sequences in the serialization stage, then present a correlative pre-training task, reading order prediction, and learn the proper reading order of documents.
Experimental results show ERNIE achieves superior performance on various downstream tasks, setting new state-of-the-art on key information, and document question answering.
arXiv Detail & Related papers (2022-10-12T12:59:24Z) - Towards Making the Most of Context in Neural Machine Translation [112.9845226123306]
We argue that previous research did not make a clear use of the global context.
We propose a new document-level NMT framework that deliberately models the local context of each sentence.
arXiv Detail & Related papers (2020-02-19T03:30:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.