Related papers: Modelling the semantics of text in complex document layouts using graph transformer networks

Modelling the semantics of text in complex document layouts using graph transformer networks

URL: http://arxiv.org/abs/2202.09144v1
Date: Fri, 18 Feb 2022 11:49:06 GMT
Title: Modelling the semantics of text in complex document layouts using graph transformer networks
Authors: Thomas Roland Barillot (1), Jacob Saks (1), Polena Lilyanova (1), Edward Torgas (1), Yachen Hu (1), Yuanqing Liu (1), Varun Balupuri (1) and Paul Gaskell (1) ((1) BlackRock Inc.)
Abstract summary: We propose a model that approximates the human reading pattern of a document and outputs a unique semantic representation for every text span. We base our architecture on a graph representation of the structured text, and we demonstrate that not only can we retrieve semantically similar information across documents but also that the embedding space we generate captures useful semantic information.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Representing structured text from complex documents typically calls for different machine learning techniques, such as language models for paragraphs and convolutional neural networks (CNNs) for table extraction, which prohibits drawing links between text spans from different content types. In this article we propose a model that approximates the human reading pattern of a document and outputs a unique semantic representation for every text span irrespective of the content type they are found in. We base our architecture on a graph representation of the structured text, and we demonstrate that not only can we retrieve semantically similar information across documents but also that the embedding space we generate captures useful semantic information, similar to language models that work only on text sequences.

Related papers

Classifying the Unknown: In-Context Learning for Open-Vocabulary Text and Symbol Recognition [3.667678728817253]
We introduce Rosetta, a multimodal model that leverages Multimodal In-Context Learning (MICL) to classify sequences of novel script patterns in documents. A key strength of our method is the use of a Context-Aware Tokenizer (CAT), which enables open-vocabulary classification. As a result, it unlocks applications such as the recognition of new alphabets and languages.
arXiv Detail & Related papers (2025-04-09T12:58:25Z)
factgenie: A Framework for Span-based Evaluation of Generated Texts [1.6864244598342872]
s can capture various span-based phenomena such as semantic inaccuracies or irrelevant text. Our framework consists of a web interface for data visualization and gathering text annotations.
arXiv Detail & Related papers (2024-07-25T08:33:23Z)
Patton: Language Model Pretraining on Text-Rich Networks [33.914163727649466]
We propose PretrAining on TexT-Rich NetwOrk framework Patton for text-rich networks. Patton includes two pretraining strategies: network-contextualized masked language modeling and masked node prediction. We conduct experiments on four downstream tasks in five datasets from both academic and e-commerce domains.
arXiv Detail & Related papers (2023-05-20T19:17:10Z)
WordStylist: Styled Verbatim Handwritten Text Generation with Latent Diffusion Models [8.334487584550185]
We present a latent diffusion-based method for styled text-to-text-content-image generation on word-level. Our proposed method is able to generate realistic word image samples from different writer styles. We show that the proposed model produces samples that are aesthetically pleasing, help boosting text recognition performance, and get similar writer retrieval score as real data.
arXiv Detail & Related papers (2023-03-29T10:19:26Z)
Pre-training Language Model Incorporating Domain-specific Heterogeneous Knowledge into A Unified Representation [49.89831914386982]
We propose a unified pre-trained language model (PLM) for all forms of text, including unstructured text, semi-structured text, and well-structured text. Our approach outperforms the pre-training of plain text using only 1/4 of the data.
arXiv Detail & Related papers (2021-09-02T16:05:24Z)
Full Page Handwriting Recognition via Image to Sequence Extraction [0.0]
The model achieves a new state-of-art in full page recognition on the IAM dataset. It is deployed in production as part of a commercial web application.
arXiv Detail & Related papers (2021-03-11T04:37:29Z)
Minimally-Supervised Structure-Rich Text Categorization via Learning on Text-Rich Networks [61.23408995934415]
We propose a novel framework for minimally supervised categorization by learning from the text-rich network. Specifically, we jointly train two modules with different inductive biases -- a text analysis module for text understanding and a network learning module for class-discriminative, scalable network learning. Our experiments show that given only three seed documents per category, our framework can achieve an accuracy of about 92%.
arXiv Detail & Related papers (2021-02-23T04:14:34Z)
Neural Deepfake Detection with Factual Structure of Text [78.30080218908849]
We propose a graph-based model for deepfake detection of text. Our approach represents the factual structure of a given document as an entity graph. Our model can distinguish the difference in the factual structure between machine-generated text and human-written text.
arXiv Detail & Related papers (2020-10-15T02:35:31Z)
A Graph Representation of Semi-structured Data for Web Question Answering [96.46484690047491]
We propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations. Our method improves F1 score by 3.90 points over the state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-14T04:01:54Z)
A Multi-Perspective Architecture for Semantic Code Search [58.73778219645548]
We propose a novel multi-perspective cross-lingual neural framework for code--text matching. Our experiments on the CoNaLa dataset show that our proposed model yields better performance than previous approaches.
arXiv Detail & Related papers (2020-05-06T04:46:11Z)
Learning to Select Bi-Aspect Information for Document-Scale Text Content Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer. In detail, the input is a set of structured records and a reference text for describing another recordset. The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.