KBSET -- Knowledge-Based Support for Scholarly Editing and Text
Processing with Declarative LaTeX Markup and a Core Written in SWI-Prolog
- URL: http://arxiv.org/abs/2002.10329v1
- Date: Mon, 24 Feb 2020 15:57:41 GMT
- Title: KBSET -- Knowledge-Based Support for Scholarly Editing and Text
Processing with Declarative LaTeX Markup and a Core Written in SWI-Prolog
- Authors: Jana Kittelmann, Christoph Wernhard
- Abstract summary: KBSET includes specially developed styles and a prototypical core system that is written in SWI-Prolog.
KBSET can process declarative application-specific markup that is expressed in notation.
KBSET includes specially developed styles and a prototypical core system that is written in SWI-Prolog.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: KBSET is an environment that provides support for scholarly editing in two
flavors: First, as a practical tool KBSET/Letters that accompanies the
development of editions of correspondences (in particular from the 18th and
19th century), completely from source documents to PDF and HTML presentations.
Second, as a prototypical tool KBSET/NER for experimentally investigating novel
forms of working on editions that are centered around automated named entity
recognition. KBSET can process declarative application-specific markup that is
expressed in LaTeX notation and incorporate large external fact bases that are
typically provided in RDF. KBSET includes specially developed LaTeX styles and
a core system that is written in SWI-Prolog, which is used there in many roles,
utilizing that it realizes the potential of Prolog as a unifying language.
Related papers
- Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models [53.17363502535395]
Trustworthy language models should provide both correct and verifiable answers.<n>Current systems insert citations by querying an external retriever at inference time.<n>We propose Active Indexing, which continually pretrains on synthetic QA pairs.
arXiv Detail & Related papers (2025-06-21T04:48:05Z) - LATex: Leveraging Attribute-based Text Knowledge for Aerial-Ground Person Re-Identification [63.07563443280147]
We propose a novel framework named LATex for AG-ReID.
It adopts prompt-tuning strategies to leverage attribute-based text knowledge.
Our framework can fully leverage attribute-based text knowledge to improve the AG-ReID.
arXiv Detail & Related papers (2025-03-31T04:47:05Z) - WritingBench: A Comprehensive Benchmark for Generative Writing [87.48445972563631]
We present WritingBench, a benchmark designed to evaluate large language models (LLMs) across 6 core writing domains and 100, encompassing creative, persuasive, informative, and technical writing.
We propose a query-dependent evaluation framework that empowers LLMs to dynamically generate instance-specific assessment criteria.
This framework is complemented by a fine-tuned critic model for criteria-aware scoring, enabling evaluations in style, format and length.
arXiv Detail & Related papers (2025-03-07T08:56:20Z) - Greek2MathTex: A Greek Speech-to-Text Framework for LaTeX Equations Generation [1.7660225024861564]
We present a novel speech-to-La equations system specifically designed for the Greek language.
We propose an end-to-end system that harnesses the power of Automatic Speech Recognition (ASR) and Natural Language Processing (NLP) techniques.
arXiv Detail & Related papers (2024-12-11T22:29:44Z) - Towards Semantic Markup of Mathematical Documents via User Interaction [0.0]
We present an approach to semantic markup of formulas by (semi-)automatically generating grammars from existing s macro definitions and parsing formulas with them.
We also present a GUI-based tool for the disambiguation of parse results and showcase its potential using a grammar for parsing untyped $lambda$-terms.
arXiv Detail & Related papers (2024-08-05T12:36:40Z) - PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer [51.260384040953326]
Handwritten Mathematical Expression Recognition (HMER) has wide applications in human-machine interaction scenarios.
We propose a position forest transformer (PosFormer) for HMER, which jointly optimize two tasks: expression recognition and position recognition.
PosFormer consistently outperforms the state-of-the-art methods 2.03%/1.22%/2, 1.83%, and 4.62% gains on datasets.
arXiv Detail & Related papers (2024-07-10T15:42:58Z) - Extracting and Encoding: Leveraging Large Language Models and Medical Knowledge to Enhance Radiological Text Representation [31.370503681645804]
We present a novel two-stage framework designed to extract high-quality factual statements from free-text radiology reports.
Our framework also includes a new embedding-based metric ( CXRFE) for evaluating chest X-ray text generation systems.
arXiv Detail & Related papers (2024-07-02T04:39:19Z) - XFormParser: A Simple and Effective Multimodal Multilingual Semi-structured Form Parser [35.69888780388425]
In this work, we introduce a simple but effective textbfMultimodal and textbfMultilingual semi-structured textbfFORM textbfXForm framework.
textbfXForm is anchored on a comprehensive pre-trained language model and innovatively amalgamates entity recognition and relationRE.
Our framework exhibits exceptionally improved performance across tasks in both multi-language and zero-shot contexts.
arXiv Detail & Related papers (2024-05-27T16:37:17Z) - Hypertext Entity Extraction in Webpage [112.56734676713721]
We introduce a textbfMoE-based textbfEntity textbfExtraction textbfFramework (textitMoEEF), which integrates multiple features to enhance model performance.
We also analyze the effectiveness of hypertext features in textitHEED and several model components in textitMoEEF.
arXiv Detail & Related papers (2024-03-04T03:21:40Z) - Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following [59.997857926808116]
We introduce a semantic panel as the decoding in texts to images.
The panel is obtained through arranging the visual concepts parsed from the input text.
We develop a practical system and showcase its potential in continuous generation and chatting-based editing.
arXiv Detail & Related papers (2023-11-28T17:57:44Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Recent advances in the Self-Referencing Embedding Strings (SELFIES)
library [1.9573380763700712]
String-based molecular representations play a crucial role in cheminformatics applications.
Traditional string-based representations such as SMILES are often prone to syntactic and semantic errors when produced by generative models.
SELF-referencIng Embedded Strings (SELFIES) was proposed that is inherently 100% robust, alongside an accompanying open-source implementation.
arXiv Detail & Related papers (2023-02-07T17:24:08Z) - Digital Editions as Distant Supervision for Layout Analysis of Printed
Books [76.29918490722902]
We describe methods for exploiting this semantic markup as distant supervision for training and evaluating layout analysis models.
In experiments with several model architectures on the half-million pages of the Deutsches Textarchiv (DTA), we find a high correlation of these region-level evaluation methods with pixel-level and word-level metrics.
We discuss the possibilities for improving accuracy with self-training and the ability of models trained on the DTA to generalize to other historical printed books.
arXiv Detail & Related papers (2021-12-23T16:51:53Z) - Reproducible Science with LaTeX [4.09920839425892]
This paper proposes a procedure to execute external source codes from a document.
It includes the calculation outputs in the resulting Portable Document Format (pdf) file automatically.
arXiv Detail & Related papers (2020-10-04T04:04:07Z) - N-LTP: An Open-source Neural Language Technology Platform for Chinese [68.58732970171747]
textttN- is an open-source neural language technology platform supporting six fundamental Chinese NLP tasks.
textttN- adopts the multi-task framework by using a shared pre-trained model, which has the advantage of capturing the shared knowledge across relevant Chinese tasks.
arXiv Detail & Related papers (2020-09-24T11:45:39Z) - DART: Open-Domain Structured Data Record to Text Generation [91.23798751437835]
We present DART, an open domain structured DAta Record to Text generation dataset with over 82k instances (DARTs)
We propose a procedure of extracting semantic triples from tables that encode their structures by exploiting the semantic dependencies among table headers and the table title.
Our dataset construction framework effectively merged heterogeneous sources from open domain semantic parsing and dialogue-act-based meaning representation tasks.
arXiv Detail & Related papers (2020-07-06T16:35:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.