Finding Pragmatic Differences Between Disciplines
- URL: http://arxiv.org/abs/2310.00204v1
- Date: Sat, 30 Sep 2023 00:46:14 GMT
- Title: Finding Pragmatic Differences Between Disciplines
- Authors: Lee Kezar, Jay Pujara
- Abstract summary: We learn a fixed set of domain-agnostic descriptors for document sections and "retrofit" the corpus to these descriptors.
We analyze the position and ordering of these descriptors across documents to understand the relationship between discipline and structure.
Our findings lay the foundation for future work in assessing research quality, domain style transfer, and further pragmatic analysis.
- Score: 14.587150614245123
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Scholarly documents have a great degree of variation, both in terms of
content (semantics) and structure (pragmatics). Prior work in scholarly
document understanding emphasizes semantics through document summarization and
corpus topic modeling but tends to omit pragmatics such as document
organization and flow. Using a corpus of scholarly documents across 19
disciplines and state-of-the-art language modeling techniques, we learn a fixed
set of domain-agnostic descriptors for document sections and "retrofit" the
corpus to these descriptors (also referred to as "normalization"). Then, we
analyze the position and ordering of these descriptors across documents to
understand the relationship between discipline and structure. We report
within-discipline structural archetypes, variability, and between-discipline
comparisons, supporting the hypothesis that scholarly communities, despite
their size, diversity, and breadth, share similar avenues for expressing their
work. Our findings lay the foundation for future work in assessing research
quality, domain style transfer, and further pragmatic analysis.
Related papers
- Data-driven Coreference-based Ontology Building [48.995395445597225]
Coreference resolution is traditionally used as a component in individual document understanding.
We take a more global view and explore what can we learn about a domain from the set of all document-level coreference relations.
We release the coreference chains resulting under a creative-commons license, along with the code.
arXiv Detail & Related papers (2024-10-22T14:30:40Z) - Inferring Scientific Cross-Document Coreference and Hierarchy with Definition-Augmented Relational Reasoning [7.086262532457526]
We present a novel method which generates context-dependent definitions of concept mentions by retrieving full-text literature.
We further generate relational definitions, which describe how two concept mentions are related or different, and design an efficient re-ranking approach to address the explosion involved in inferring links across papers.
arXiv Detail & Related papers (2024-09-23T15:20:27Z) - Leveraging Collection-Wide Similarities for Unsupervised Document Structure Extraction [61.998789448260005]
We propose to identify the typical structure of document within a collection.
We abstract over arbitrary header paraphrases, and ground each topic to respective document locations.
We develop an unsupervised graph-based method which leverages both inter- and intra-document similarities.
arXiv Detail & Related papers (2024-02-21T16:22:21Z) - ATLANTIC: Structure-Aware Retrieval-Augmented Language Model for
Interdisciplinary Science [0.0]
Large language models record impressive performance on many natural language processing tasks.
Retrieval augmentation offers an effective solution by retrieving context from external knowledge sources.
We propose a novel structure-aware retrieval augmented language model that accommodates document structure during retrieval augmentation.
arXiv Detail & Related papers (2023-11-21T02:02:46Z) - Document AI: A Comparative Study of Transformer-Based, Graph-Based
Models, and Convolutional Neural Networks For Document Layout Analysis [3.231170156689185]
Document AI aims to automatically analyze documents by leveraging natural language processing and computer vision techniques.
One of the major tasks of Document AI is document layout analysis, which structures document pages by interpreting the content and spatial relationships of layout, image, and text.
arXiv Detail & Related papers (2023-08-29T16:58:03Z) - How Do Transformers Learn Topic Structure: Towards a Mechanistic
Understanding [56.222097640468306]
We provide mechanistic understanding of how transformers learn "semantic structure"
We show, through a combination of mathematical analysis and experiments on Wikipedia data, that the embedding layer and the self-attention layer encode the topical structure.
arXiv Detail & Related papers (2023-03-07T21:42:17Z) - Variational Cross-Graph Reasoning and Adaptive Structured Semantics
Learning for Compositional Temporal Grounding [143.5927158318524]
Temporal grounding is the task of locating a specific segment from an untrimmed video according to a query sentence.
We introduce a new Compositional Temporal Grounding task and construct two new dataset splits.
We argue that the inherent structured semantics inside the videos and language is the crucial factor to achieve compositional generalization.
arXiv Detail & Related papers (2023-01-22T08:02:23Z) - Domain-Specific Word Embeddings with Structure Prediction [3.057136788672694]
We present an empirical evaluation on New York Times articles and two English Wikipedia datasets with articles on science and philosophy.
Our method, called Word2Vec with Structure Prediction (W2VPred), provides better performance than baselines in terms of the general analogy tests.
As a use case in the field of Digital Humanities we demonstrate how to raise novel research questions for high literature from the German Text Archive.
arXiv Detail & Related papers (2022-10-06T12:45:48Z) - Revise and Resubmit: An Intertextual Model of Text-based Collaboration
in Peer Review [52.359007622096684]
Peer review is a key component of the publishing process in most fields of science.
Existing NLP studies focus on the analysis of individual texts.
editorial assistance often requires modeling interactions between pairs of texts.
arXiv Detail & Related papers (2022-04-22T16:39:38Z) - Bilingual Topic Models for Comparable Corpora [9.509416095106491]
We propose a binding mechanism between the distributions of the paired documents.
To estimate the similarity of documents that are written in different languages we use cross-lingual word embeddings that are learned with shallow neural networks.
We evaluate the proposed binding mechanism by extending two topic models: a bilingual adaptation of LDA that assumes bag-of-words inputs and a model that incorporates part of the text structure in the form of boundaries of semantically coherent segments.
arXiv Detail & Related papers (2021-11-30T10:53:41Z) - Explaining Relationships Between Scientific Documents [55.23390424044378]
We address the task of explaining relationships between two scientific documents using natural language text.
In this paper we establish a dataset of 622K examples from 154K documents.
arXiv Detail & Related papers (2020-02-02T03:54:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.