KBSET -- Knowledge-Based Support for Scholarly Editing and Text
Processing with Declarative LaTeX Markup and a Core Written in SWI-Prolog
- URL: http://arxiv.org/abs/2002.10329v1
- Date: Mon, 24 Feb 2020 15:57:41 GMT
- Title: KBSET -- Knowledge-Based Support for Scholarly Editing and Text
Processing with Declarative LaTeX Markup and a Core Written in SWI-Prolog
- Authors: Jana Kittelmann, Christoph Wernhard
- Abstract summary: KBSET includes specially developed styles and a prototypical core system that is written in SWI-Prolog.
KBSET can process declarative application-specific markup that is expressed in notation.
KBSET includes specially developed styles and a prototypical core system that is written in SWI-Prolog.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: KBSET is an environment that provides support for scholarly editing in two
flavors: First, as a practical tool KBSET/Letters that accompanies the
development of editions of correspondences (in particular from the 18th and
19th century), completely from source documents to PDF and HTML presentations.
Second, as a prototypical tool KBSET/NER for experimentally investigating novel
forms of working on editions that are centered around automated named entity
recognition. KBSET can process declarative application-specific markup that is
expressed in LaTeX notation and incorporate large external fact bases that are
typically provided in RDF. KBSET includes specially developed LaTeX styles and
a core system that is written in SWI-Prolog, which is used there in many roles,
utilizing that it realizes the potential of Prolog as a unifying language.
Related papers
- PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer [51.260384040953326]
Handwritten Mathematical Expression Recognition (HMER) has wide applications in human-machine interaction scenarios.
We propose a position forest transformer (PosFormer) for HMER, which jointly optimize two tasks: expression recognition and position recognition.
PosFormer consistently outperforms the state-of-the-art methods 2.03%/1.22%/2, 1.83%, and 4.62% gains on datasets.
arXiv Detail & Related papers (2024-07-10T15:42:58Z) - Extracting and Encoding: Leveraging Large Language Models and Medical Knowledge to Enhance Radiological Text Representation [31.370503681645804]
We present a novel two-stage framework designed to extract high-quality factual statements from free-text radiology reports.
Our framework also includes a new embedding-based metric ( CXRFE) for evaluating chest X-ray text generation systems.
arXiv Detail & Related papers (2024-07-02T04:39:19Z) - XFormParser: A Simple and Effective Multimodal Multilingual Semi-structured Form Parser [35.69888780388425]
In this work, we introduce a simple but effective textbfMultimodal and textbfMultilingual semi-structured textbfFORM textbfXForm framework.
textbfXForm is anchored on a comprehensive pre-trained language model and innovatively amalgamates entity recognition and relationRE.
Our framework exhibits exceptionally improved performance across tasks in both multi-language and zero-shot contexts.
arXiv Detail & Related papers (2024-05-27T16:37:17Z) - KnowledgeHub: An end-to-end Tool for Assisted Scientific Discovery [1.6080795642111267]
This paper describes the KnowledgeHub tool, a scientific literature Information Extraction (IE) and Question Answering (QA) pipeline.
This is achieved by supporting the ingestion of PDF documents that are converted to text and structured representations.
A browser-based annotation tool enables annotating the contents of the PDF documents according to the ontology.
A knowledge graph is constructed from these entity and relation triples which can be queried to obtain insights from the data.
arXiv Detail & Related papers (2024-05-16T13:17:14Z) - Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following [59.997857926808116]
We introduce a semantic panel as the decoding in texts to images.
The panel is obtained through arranging the visual concepts parsed from the input text.
We develop a practical system and showcase its potential in continuous generation and chatting-based editing.
arXiv Detail & Related papers (2023-11-28T17:57:44Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Recent advances in the Self-Referencing Embedding Strings (SELFIES)
library [1.9573380763700712]
String-based molecular representations play a crucial role in cheminformatics applications.
Traditional string-based representations such as SMILES are often prone to syntactic and semantic errors when produced by generative models.
SELF-referencIng Embedded Strings (SELFIES) was proposed that is inherently 100% robust, alongside an accompanying open-source implementation.
arXiv Detail & Related papers (2023-02-07T17:24:08Z) - Digital Editions as Distant Supervision for Layout Analysis of Printed
Books [76.29918490722902]
We describe methods for exploiting this semantic markup as distant supervision for training and evaluating layout analysis models.
In experiments with several model architectures on the half-million pages of the Deutsches Textarchiv (DTA), we find a high correlation of these region-level evaluation methods with pixel-level and word-level metrics.
We discuss the possibilities for improving accuracy with self-training and the ability of models trained on the DTA to generalize to other historical printed books.
arXiv Detail & Related papers (2021-12-23T16:51:53Z) - Reproducible Science with LaTeX [4.09920839425892]
This paper proposes a procedure to execute external source codes from a document.
It includes the calculation outputs in the resulting Portable Document Format (pdf) file automatically.
arXiv Detail & Related papers (2020-10-04T04:04:07Z) - N-LTP: An Open-source Neural Language Technology Platform for Chinese [68.58732970171747]
textttN- is an open-source neural language technology platform supporting six fundamental Chinese NLP tasks.
textttN- adopts the multi-task framework by using a shared pre-trained model, which has the advantage of capturing the shared knowledge across relevant Chinese tasks.
arXiv Detail & Related papers (2020-09-24T11:45:39Z) - DART: Open-Domain Structured Data Record to Text Generation [91.23798751437835]
We present DART, an open domain structured DAta Record to Text generation dataset with over 82k instances (DARTs)
We propose a procedure of extracting semantic triples from tables that encode their structures by exploiting the semantic dependencies among table headers and the table title.
Our dataset construction framework effectively merged heterogeneous sources from open domain semantic parsing and dialogue-act-based meaning representation tasks.
arXiv Detail & Related papers (2020-07-06T16:35:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.