Related papers: Document Understanding, Measurement, and Manipulation Using Category Theory

Document Understanding, Measurement, and Manipulation Using Category Theory

URL: http://arxiv.org/abs/2510.21553v1
Date: Fri, 24 Oct 2025 15:12:08 GMT
Title: Document Understanding, Measurement, and Manipulation Using Category Theory
Authors: Jared Claypoole, Yunye Gong, Noson S. Yanofsky, Ajay Divakaran,
Abstract summary: We apply category theory to extract multimodal document structure.<n>We develop information theoretic measures, content summarization and extension, and self-supervised improvement of large pretrained models.
Score: 7.117514203300817
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We apply category theory to extract multimodal document structure which leads us to develop information theoretic measures, content summarization and extension, and self-supervised improvement of large pretrained models. We first develop a mathematical representation of a document as a category of question-answer pairs. Second, we develop an orthogonalization procedure to divide the information contained in one or more documents into non-overlapping pieces. The structures extracted in the first and second steps lead us to develop methods to measure and enumerate the information contained in a document. We also build on those steps to develop new summarization techniques, as well as to develop a solution to a new problem viz. exegesis resulting in an extension of the original document. Our question-answer pair methodology enables a novel rate distortion analysis of summarization techniques. We implement our techniques using large pretrained models, and we propose a multimodal extension of our overall mathematical framework. Finally, we develop a novel self-supervised method using RLVR to improve large pretrained models using consistency constraints such as composability and closure under certain operations that stem naturally from our category theoretic framework.

Related papers

DREAM: Document Reconstruction via End-to-end Autoregressive Model [53.51754520966657]
We present an innovative autoregressive model specifically designed for document reconstruction, referred to as Document Reconstruction via End-to-end Autoregressive Model (DREAM)<n>We establish a standardized definition of the document reconstruction task, and introduce a novel Document Similarity Metric (DSM) and DocRec1K dataset for assessing the performance of the task.
arXiv Detail & Related papers (2025-07-08T09:24:07Z)
Graph Topic Modeling for Documents with Spatial or Covariate Dependencies [0.9208007322096533]
We address the challenge of incorporating document-level metadata into topic modeling.<n>We propose a new estimator based on a fast graph-regularized iterative singular value decomposition.<n>We validate our model through comprehensive experiments on synthetic datasets and three real-world corpora.
arXiv Detail & Related papers (2024-12-19T03:00:26Z)
Enhancing binary classification: A new stacking method via leveraging computational geometry [5.906199156511947]
This paper introduces a novel approach that integrates computational geometry techniques, specifically solving the maximum weighted rectangle problem, to develop a new meta-model for binary classification. Our method is evaluated on multiple open datasets, with statistical analysis showing its stability and demonstrating improvements in accuracy. Our method is highly applicable not only in stacking ensemble learning but also in various real-world applications, such as hospital health evaluation scoring and bank credit scoring systems.
arXiv Detail & Related papers (2024-10-30T06:11:08Z)
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities [89.40778301238642]
Model merging is an efficient empowerment technique in the machine learning community. There is a significant gap in the literature regarding a systematic and thorough review of these techniques.
arXiv Detail & Related papers (2024-08-14T16:58:48Z)
Document-Level In-Context Few-Shot Relation Extraction via Pre-Trained Language Models [29.94694305204144]
We present a novel framework for document-level in-context few-shot relation extraction. We evaluate our framework using DocRED, the largest publicly available dataset for document-level relation extraction.
arXiv Detail & Related papers (2023-10-17T09:10:27Z)
Peek Across: Improving Multi-Document Modeling via Cross-Document Question-Answering [49.85790367128085]
We pre-training a generic multi-document model from a novel cross-document question answering pre-training objective. This novel multi-document QA formulation directs the model to better recover cross-text informational relations. Unlike prior multi-document models that focus on either classification or summarization tasks, our pre-training objective formulation enables the model to perform tasks that involve both short text generation and long text generation.
arXiv Detail & Related papers (2023-05-24T17:48:40Z)
Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding. UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input. An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z)
A Typology for Exploring the Mitigation of Shortcut Behavior [29.38025128165229]
We provide a unification of various XIL methods into a single typology by establishing a common set of basic modules. In our evaluations, all methods prove to revise a model successfully. However, we found remarkable differences in individual benchmark tasks, revealing valuable application-relevant aspects.
arXiv Detail & Related papers (2022-03-04T14:16:50Z)
Integrating Semantics and Neighborhood Information with Graph-Driven Generative Models for Document Retrieval [51.823187647843945]
In this paper, we encode the neighborhood information with a graph-induced Gaussian distribution, and propose to integrate the two types of information with a graph-driven generative model. Under the approximation, we prove that the training objective can be decomposed into terms involving only singleton or pairwise documents, enabling the model to be trained as efficiently as uncorrelated ones.
arXiv Detail & Related papers (2021-05-27T11:29:03Z)
A Diagnostic Study of Explainability Techniques for Text Classification [52.879658637466605]
We develop a list of diagnostic properties for evaluating existing explainability techniques. We compare the saliency scores assigned by the explainability techniques with human annotations of salient input regions to find relations between a model's performance and the agreement of its rationales with human ones.
arXiv Detail & Related papers (2020-09-25T12:01:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.