Functional Analytics for Document Ordering for Curriculum Development
and Comprehension
- URL: http://arxiv.org/abs/2312.09457v1
- Date: Wed, 22 Nov 2023 02:13:27 GMT
- Title: Functional Analytics for Document Ordering for Curriculum Development
and Comprehension
- Authors: Arturo N. Villanueva Jr. and Steven J. Simske
- Abstract summary: We propose techniques for automatic document order generation for curriculum development and for creation of optimal reading order for use in learning, training, and other content-sequencing applications.
Such techniques could potentially be used to improve comprehension, identify areas that need expounding, generate curricula, and improve search engine results.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose multiple techniques for automatic document order generation for
(1) curriculum development and for (2) creation of optimal reading order for
use in learning, training, and other content-sequencing applications. Such
techniques could potentially be used to improve comprehension, identify areas
that need expounding, generate curricula, and improve search engine results. We
advance two main techniques: The first uses document similarities through
various methods. The second uses entropy against the backdrop of topics
generated through Latent Dirichlet Allocation (LDA). In addition, we try the
same methods on the summarized documents and compare them against the results
obtained using the complete documents. Our results showed that while the
document orders for our control document sets (biographies, novels, and
Wikipedia articles) could not be predicted using our methods, our test
documents (textbooks, courses, journal papers, dissertations) provided more
reliability. We also demonstrated that summarized documents were good stand-ins
for the complete documents for the purposes of ordering.
Related papers
- Unified Multi-Modal Interleaved Document Representation for Information Retrieval [57.65409208879344]
We produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities.
Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation.
arXiv Detail & Related papers (2024-10-03T17:49:09Z) - Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - A Novel Dataset for Non-Destructive Inspection of Handwritten Documents [0.0]
Forensic handwriting examination aims to examine handwritten documents in order to properly define or hypothesize the manuscript's author.
We propose a new and challenging dataset consisting of two subsets: the first consists of 21 documents written either by the classic pen and paper" approach (and later digitized) and directly acquired on common devices such as tablets.
Preliminary results on the proposed datasets show that 90% classification accuracy can be achieved on the first subset.
arXiv Detail & Related papers (2024-01-09T09:25:58Z) - HADES: Homologous Automated Document Exploration and Summarization [3.3509104620016092]
HADES is designed to streamline the work of professionals dealing with large volumes of documents.
The tool employs a multi-step pipeline that begins with processing PDF documents using topic modeling, summarization, and analysis of the most important words for each topic.
arXiv Detail & Related papers (2023-02-25T15:16:10Z) - Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding.
UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input.
An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - Coherence-Based Distributed Document Representation Learning for
Scientific Documents [9.646001537050925]
We propose a coupled text pair embedding (CTPE) model to learn the representation of scientific documents.
We use negative sampling to construct uncoupled text pairs whose two parts are from different documents.
We train the model to judge whether the text pair is coupled or uncoupled and use the obtained embedding of coupled text pairs as the embedding of documents.
arXiv Detail & Related papers (2022-01-08T15:29:21Z) - Focused Attention Improves Document-Grounded Generation [111.42360617630669]
Document grounded generation is the task of using the information provided in a document to improve text generation.
This work focuses on two different document grounded generation tasks: Wikipedia Update Generation task and Dialogue response generation.
arXiv Detail & Related papers (2021-04-26T16:56:29Z) - Multilevel Text Alignment with Cross-Document Attention [59.76351805607481]
Existing alignment methods operate at a single, predefined level.
We propose a new learning approach that equips previously established hierarchical attention encoders for representing documents with a cross-document attention component.
arXiv Detail & Related papers (2020-10-03T02:52:28Z) - Towards a Multi-modal, Multi-task Learning based Pre-training Framework
for Document Representation Learning [5.109216329453963]
We introduce Document Topic Modelling and Document Shuffle Prediction as novel pre-training tasks.
We utilize the Longformer network architecture as the backbone to encode the multi-modal information from multi-page documents in an end-to-end fashion.
arXiv Detail & Related papers (2020-09-30T05:39:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.